Python subprocess example: running a background subprocess with non-blocking output processing

This example runs rsync as a subprocess in Python using subprocess.Popen(). It is slightly more complicated than most other Popen() examples because the output of the rsync is directed into the logging module in a way that is non-blocking, so that the output is logged as rsync runs rather than after rsync finishes, and the python script is free to do other things while rsync runs in the background.

This is my first time getting back to Python after three years. For 18 months at AMD, I dug fairly deeply into Perl. My own experience with Perl was that as I mastered the language's syntax (not easy to do, since it is often strange and unpredictable), my code became more and more difficult to read and understand. I have not had that problem with Python, which I have spent a couple of years with, so for my current project, setting up an infrastructure for automated manufacturing of art frames, I am doing everything in Python. The first task is setting up a mongodb-backed, django-administered system to mirror RPM distros through a combination of either rsync or grinder. I haven't found a widely-used rsync module for Python not written in (slow) native Python code, so instead I wrote a wrapper for the rsync executable, the meat of which is shown here.

The rsync wrapper module

The trick in this example is to create two threads that use the normal blocking functions to read stdout and stderr. When the function returns a line of text, the thread sends the line of text to a logging object, and then reads the next line.

The function could send the output anywhere, not necessarily a logging object. In the links below, there is an example of sending it to a Queue object which has non-blocking functions to poll for output. Be sure that whatever code processes the output is thread safe. The documentation for the logging module indicates it is thread safe, so there's nothing to worry about in this example.

The example could be further generalized to take the same arguments as Popen(); then it could be used to capture output from any process.

The rsync object subclasses subprocess.Popen so that the python script can interact with it using the normal Popen object methods like wait(), poll() and terminate().

The __init__() function constructs the rsync command line, and then calls subprocess.Popen() to execute it. Note here that the Popen() call passes the PIPE object in through the stdout and stderr arguments. This makes stdout and stderr available as stream objects to pass to log_thread() in the folloving lines.

The log_thread() calls pass the stdout and stderr streams with their respective logging functions to be used by the thread function. log_thread() defines the thread function log_output() and initializes and runs the thread.

Here is the code for the rsync module. Following that is example code for calling it from a python script.

import subprocess
import threading
import time
from logging import getLogger

LOG = getLogger('rsync')

class rsync(subprocess.Popen):
    """
    Run rsync as a subprocess sending output to a logger.
    This class subclasses subprocess.Popen
    """

    def log_thread(self,pipe,logger):
        """
        Start a thread logging output from pipe
        """

        # thread function to log subprocess output
        def log_output(out, logger):
            for line in iter(out.readline, b''):
                logger(line.rstrip('\n'))

        # start thread
        t = threading.Thread(target=log_output,
                             args=(pipe, logger))
        t.daemon = True # thread dies with the program
        t.start()

    def __init__(self,src,dest):
        # construct the command line
        cmd = ['/usr/bin/rsync', '-rlptvHz',
               #'--dry-run',
               '--size-only',
               '--delete',
               # '--bwlimit=256',
               src, dest ]

        # spawn the rsync process
        super(rsync,self).__init__(
            cmd, shell=False,
            stdout=subprocess.PIPE, stderr=subprocess.PIPE,
            bufsize=1, close_fds='posix')

        LOG.debug("Started rsync subprocess, pid %s" % self.pid)
        LOG.debug("Command:  '%s'" % "','".join(cmd))

        # start stdout and stderr logging threads
        self.log_thread(self.stdout,LOG.info)
        self.log_thread(self.stderr,LOG.warn)

Using the rsync wrapper module in a script

To call this from a python program is fairly simple. Here, a CLI class contains the main() function, which spawns the rsync process and then calls wait().

The CLI is created as a class because it makes it easy to create an interrupt handler that calls self.rsync.terminate() on the subprocess, set self.stop to True, and set self.exitval if necessary. to end the main loop. Describing the interrupt handler implementation is the subject for a future post.

class CLI:
    def main(self):
        self.exitval = 0
        from rsync import rsync
        for src, dest in (('srchost:/tmp/foo/','/tmp/foo'),('srchost:/tmp/bar/','tmp/bar')):
            # stop if we were told to
            if self.stop:
                break

            # kick off the process, attach object to self; wait for process to exit
            self.rsync = rsync(src,dest)
            if self.rsync.wait()
                LOG.warn("rsync (src: %s, dest: %s) exited with code %d" %
                    (src,dest,self.rsync.returncode))
                self.exitval = 1
    exit(self.exitval)

if __name__ == "__main__":
    try:
        sys.exit(abs(CLI().main() or 0))

Links

Subprocess module docs
A thorough page of examples for using the subprocess module
A stackoverflow page on non-blocking reads from a subprocess pipe

2 Comments

DanielJuly 5th, 2012 at 3:07 pm

Hi there,
I have the same problem to solve and this information is very helpful for me.
I have a suggestion for you:
May be you should use code wrapper in your blog where you put python code. It is not easy to see which part of the python code is part of the function or part of the main code (there is no ” ” tab/space)
Take care

jmanJuly 5th, 2012 at 11:25 pm

Yeah, you’re right. I pasted that in without going back and looking at it. I actually did have ‘code’ tags in there, but they weren’t doing what I expected. ‘pre’ tags did the trick, though. Thanks Daniel!

Leave a comment

Your comment