cliche.services.wikipedia.crawlerWikipedia crawler

Crawling DBpedia tables into a relational database

See also

The list of dbpedia classes
This page describes the structure and relation of DBpedia classes.

References

cliche.services.wikipedia.crawler.count_by_class(class_list)

Get count of a ontology class

Parameters:class_list (list) – List of properties
Return type:int
cliche.services.wikipedia.crawler.count_by_relation(p)

Get count of all works

Parameters:p (list) – List of properties
Return type:int
cliche.services.wikipedia.crawler.select_by_class(s, s_name='subject', p={}, entities=[], page=1)

List of s which as property as entities

Parameters:
  • s (str) – Ontology name of subject.
  • s_name (str) – Name of subject. It doesn’t affect the results.
  • entities (list) – List of property ontologies.
  • page (int) – The offset of query, each page will return 100 entities.
Returns:

list of a dict mapping keys which have ‘entities’ as property.

Return type:

list

For example:

select_by_class (
    s_name='author',
    s=['dbpedia-owl:Artist', 'dbpedia-owl:ComicsCreator'],
    p=['dbpedia-owl:author', 'dbpprop:author', 'dbpedia-owl:writer'],
    entities=['dbpedia-owl:birthDate', 'dbpprop:shortDescription']
)
[{
    'author': 'http://dbpedia.org/page/J._K._Rowling',
    'name': 'J. K. Rowling',
    'dob' : '1965-07-31',
    'shortDescription' : 'English writer. Author of the Harry ...'
    },{
    'author': ...
}]
cliche.services.wikipedia.crawler.select_by_relation(p, revision, s_name='subject', o_name='object', page=1)

Find author of something

Retrieves the list of s_name and o_name, the relation is a kind of ontology properties.

Parameters:
  • p (list) – List of properties between s_name and o_name.
  • s_name (str) – Name of subject. It doesn’t affect the results.
  • o_name (str) – Name of object. It doesn’t affect the results.
  • page (int) – The offset of query, each page will return 100 entities.
Returns:

list of a dict mapping keys to the matching table row fetched.

Return type:

list

For example:

select_by_relation(s_name='work',
p=['dbpprop:author', 'dbpedia-owl:writer', 'dbpedia-owl:author'],
o_name='author', page=0)
[{
    'work':'http://dbpedia.org/resource/The_Frozen_Child',
    'author': 'http://dbpedia.org/resource/József_Eötvös
    http://dbpedia.org/resource/Ede_Sas'
    },{
    'work':'http://dbpedia.org/resource/Slaves_of_Sleep',
    'author': 'http://dbpedia.org/resource/L._Ron_Hubbard'
}]

When the row has more than two items, the items are combined by EOL.

cliche.services.wikipedia.crawler.select_property(s, s_name='property', return_json=False)

Get properties of a ontology.

Parameters:s (str) – Ontology name of subject.
Returns:list of objects which contain properties.
Return type:list

For example:

select_property(s='dbpedia-owl:Writer', json=True)
[{
    'property' : 'rdf:type'
    },{
    'property' : 'owl:sameAs'
}]