How to run crawler¶
This tutorial covers how to run the cliche crawlers.
Running TVTropes crawler¶
You can run TVTropes crawler using cliche crawler command with celery worker:
$ celery worker -A cliche.services.tvtropes.crawler \
--config CONFIG_FILENAME_WITHOUT_EXT
$ cliche crawler
with subcommands you can provide options:
- celery worker
- It runs celery worker to crawl links. You can supply
--purge
option for purging pending work queue, and-f LOG_FILE
to save logs into a file. - cliche crawler
- You have to provide config file with
-c CONFIG_FILE
option orCLICHE_CONFIG
environmental variable. Config option must be provided beforecrawler
subcommand.
when the crawler is first run, it will fetch and populate the celery queue with links from TVTropes Index Report. If there is already some crawled links in the database, the crawler will skip this step and populate the queue from the database.
Running Wikipedia crawler¶
You can run Wikipedia crawler in the same way using cliche crawler command with celery worker:
$ celery worker -A cliche.services.wikipedia.crawler \
--config dev.py
$ cliche sync wikipedia -c CONFIG_FILENAME_WITHOUT_EXT
It also provides same options.