cliche.celeryCelery-backed task queue worker

Sometimes web app should provide time-consuming features that cannot immediately respond to user (and we define “immediately” as “shorter than a second or two seconds” in here). Such things should be queued and then processed by background workers. Celery does that in natural way.

We use this at serveral points like resampling images to make thumbnails, or crawling ontology data from other services. Such tasks are definitely cannot “immediately” respond.

See also

What kinds of things should I use Celery for? — Celery FAQ
Answer to what kinds of benefits are there in Celery.
Queue everything and delight everyone
This article describes why you should use a queue in a web application.

How to define tasks

In order to defer some types of tasks, you have to make these functions a task. It’s not a big deal, just attach a decorator to them:

@celery.task(ignore_result=True)
def do_heavy_work(some, inputs):
    '''Do something heavy work.'''
    ...

How to defer tasks

It’s similar to ordinary function calls except it uses delay() method (or apply_async() method) instead of calling operator:

do_heavy_work.delay('some', inputs='...')

That command will be queued and sent to one of distributed workers. That means these argument values are serialized using json. If any argument value isn’t serializable it will error. Simple objects like numbers, strings, tuples, lists, dictionaries are safe to serialize. In the other hand, entity objects (that an instance of cliche.orm.Base and its subtypes) mostly fail to serialize, so use primary key values like entity id instead of object itself.

What things are ready for task?

Every deferred call of task share equivalent inital state:

While there are several things not ready either:

  • Flask’s request context isn’t ready for each task. You should explicitly deal with it using request_context() method to use context locals like flask.request. See also The Request Context.
  • Physical computers would differ from web environment. Total memory, CPU capacity, the number of processors, IP address, operating system, Python VM (which of PyPy or CPython), and other many environments also can vary. Assume nothing on these variables.
  • Hence global states (e.g. module-level global variables) are completely isolated from web environment which called the task. Don’t depend on such global states.

How to run Celery worker

celery worker (formerly celeryd) takes Celery app object as its endpoint, and Cliche’s endpoint is cliche.celery.celery. You can omit the latter variable name and module name: cliche. Execute the following command in the shell:

$ celery worker -A cliche --config dev.cfg.yml
 -------------- celery@localhost v3.1.13 (Cipater)
---- **** -----
--- * ***  * -- Darwin-13.3.0-x86_64-i386-64bit
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app:         cliche.celery:0x1... (cliche.celery.Loader)
- ** ---------- .> transport:   redis://localhost:6379/5
- ** ---------- .> results:     disabled
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ----
--- ***** ----- [queues]
 -------------- .> celery           exchange=celery(direct) key=celery


[2014-09-12 00:31:25,150: WARNING/MainProcess] celery@localhost ready.

Note that you should pass the same configuration file (--config option) to the WSGI application. It should contain DATABASE_URL and so on.

References

class cliche.celery.Loader(app, **kwargs)

The loader used by Cliche app.

cliche.celery.get_database_engine() → sqlalchemy.engine.base.Engine

Get a database engine.

Returns:a database engine
Return type:sqlalchemy.engine.base.Engine
cliche.celery.get_session() -> sessionmaker(class_='Session', bind=None, expire_on_commit=True, autoflush=True, autocommit=True)

Get a database session.

Returns:a database session
Return type:Session
cliche.celery.get_raven_client() → raven.base.Client

Get a raven client.

Returns:a raven client
Return type:raven.Client