ThreadPoolExecutor context manager with nested tasks
Contents
Introduction
Python’s ThreadPoolExecutor
’s context manager is a really neat way to run a
bunch of (I/O) work in a thread pool, and then clean everything up when the
context is exited.
Something like this:
|
|
Where did my tasks go, attempt 1
Recently at work however, I had to debug a case where it appeared that it would discard tasks that had been submitted later.
This was unexpected, because the documentation stated that the context manager
would by default do shutdown(wait=True)
and hence wait for all work to
finish.
At a first trace through the TPE code, it seemed that the thread _worker
would read the executor’s shutdown flag, and then just finish its current task.
However, I had missed a continue
in the _worker
’s while loop, and hence the
fact that at shutdown it would first empty the queue until reaching a None
task, and then exit.
Long story short: TPE does exactly what you would expect. At context manager exit, by default it lets all threads empty out the queue, and then completes context exit.
Where did my tasks go, attempt 2
In our case, it turned out that the problem was the recursive task that would keep on submitting itself as it traversed a possibly very deep hierarchical structure.
When the outer level TPE’s context manager exits, it waits for all tasks that
were submitted at that level, and then pushes the None
task which will lead
to thread workers terminating, but child tasks submitted by those tasks as they
finish obviously come after the None
values, and hence get discarded.
I’ve added a self-contained demonstrator below.
The upshot of all of this is that if your TPE tasks are also going to be submitting tasks, ensure that you keep the context open long enough so that everything can be successfully submitted before it closes.
|
|