cliche.services.wikipedia.crawler
— Wikipedia crawler¶
Crawling DBpedia tables into a relational database
See also
- The list of dbpedia classes
- This page describes the structure and relation of DBpedia classes.
References¶
-
cliche.services.wikipedia.crawler.
count_by_class
(class_list)¶ Get count of a ontology class
Parameters: class_list (list) – List of properties Return type: int
-
cliche.services.wikipedia.crawler.
count_by_relation
(p)¶ Get count of all works
Parameters: p (list) – List of properties Return type: int
-
cliche.services.wikipedia.crawler.
select_by_class
(s, s_name='subject', p={}, entities=[], page=1)¶ List of s which as property as entities
Parameters: Returns: list of a dict mapping keys which have ‘entities’ as property.
Return type: For example:
select_by_class ( s_name='author', s=['dbpedia-owl:Artist', 'dbpedia-owl:ComicsCreator'], p=['dbpedia-owl:author', 'dbpprop:author', 'dbpedia-owl:writer'], entities=['dbpedia-owl:birthDate', 'dbpprop:shortDescription'] )
[{ 'author': 'http://dbpedia.org/page/J._K._Rowling', 'name': 'J. K. Rowling', 'dob' : '1965-07-31', 'shortDescription' : 'English writer. Author of the Harry ...' },{ 'author': ... }]
-
cliche.services.wikipedia.crawler.
select_by_relation
(p, revision, s_name='subject', o_name='object', page=1)¶ Find author of something
Retrieves the list of s_name and o_name, the relation is a kind of ontology properties.
Parameters: Returns: list of a dict mapping keys to the matching table row fetched.
Return type: For example:
select_by_relation(s_name='work', p=['dbpprop:author', 'dbpedia-owl:writer', 'dbpedia-owl:author'], o_name='author', page=0)
[{ 'work':'http://dbpedia.org/resource/The_Frozen_Child', 'author': 'http://dbpedia.org/resource/József_Eötvös http://dbpedia.org/resource/Ede_Sas' },{ 'work':'http://dbpedia.org/resource/Slaves_of_Sleep', 'author': 'http://dbpedia.org/resource/L._Ron_Hubbard' }]
When the row has more than two items, the items are combined by EOL.
-
cliche.services.wikipedia.crawler.
select_property
(s, s_name='property', return_json=False)¶ Get properties of a ontology.
Parameters: s (str) – Ontology name of subject. Returns: list of objects which contain properties. Return type: list
For example:
select_property(s='dbpedia-owl:Writer', json=True)
[{ 'property' : 'rdf:type' },{ 'property' : 'owl:sameAs' }]