Python logging from multiple processes

Posted on August 13th, 2009

log

Had a rough day today. I just wanted a log. Not just any log mind you, but one that could handle writes from multiple processes running at the same time. A naive me would have put python’s basic log handler within each process and then watch as each of the processes crash and burn because of conflicting disk write access.

But I’ve learned from my past lessons – now I use the python SocketHandler for my logging needs. Here’s a basic snippet below to get you started.

# ============== Socket Logger =============== #
import logging
import logging.handlers # you NEED this line
logger = logging.getLogger("%s_%s" % (os.getpid(), sys.argv[1]) )
logger.setLevel(logging.DEBUG)
socketHandler = logging.handlers.SocketHandler('localhost',
                    logging.handlers.DEFAULT_TCP_LOGGING_PORT)
logger.addHandler(socketHandler)
# ============== Socket Logger =============== #

In a nutshell, it gets a logger instance, sets the logging level to DEBUG, and then attaches a SocketHandler to itself. This means that whenever any piece of your code calls logger.debug(“this is my log message”) , it will send that message through the pipes to your server sitting on the other end which will handle writing your log messages to disk. On most machines the DEFAULT_TCP_LOGGING_PORT will be 9020 and the server will be sitting on localhost:9020 .

The Log Server

I was surprised that python didn’t already have a canonical implementation of a logging server to interface “SocketHandler” available in the standard library. Luckily there’s a pretty neat project that does just that: python-loggingserver. As a bonus it comes with a web interface to view your logs. On most systems, you can access the website by going to http://localhost:9021/ once you’ve started the server.

It requires the twisted networking library so if you haven’t had that installed do a

sudo easy_install twisted

Then svn checkout the project:

svn checkout http://python-loggingserver.googlecode.com/svn/trunk/ logserver

Start the server:

cd logserver
twistd --pidfile=loggingserver.pid --logfile=logginserver.log --python=loggingserver.py

Now you’re ready to log. You can now use the code snippet posted at the beginning of this article or just use the prepackaged testing script to send messages to the server:

python loggingtest.py this_is_one_process

View the results at http://localhost:9021

Happy logging!



Save Your Hands

Posted on July 21st, 2009

Here’s a brief tip: Rebind your ctrl key to capslock.

Save yourself from the pain. The contorted positions we developers strain our hands into will eventually break them – Emacs & Textmate users, you know what I’m talking about. Do yourself a favor and rebind your control key to capslock. You can do this under MacOSX by going to System Preferences > Keyboard & Mouse > Keyboard > Modifier Keys. Change the settings to look like the following and you’re set.

keyboardandmouse



Franchising: Running Multiple Sites from one Django Codebase

Posted on July 14th, 2009

While surveying the modularity and reusability of code written with Django, I’ve observed that projects usually travel down two divergent paths of evolution. Either they start small and grow into behemoths as developers add feature-after-feature, or they stop growing after a certain point and are packaged into reusable components. Today I propose a third route that weaves between these two extremes and allows you to reuse your existing code by running multiple websites from a single codebase. In essence, I want you to feel like you’re developing one site, when in reality you’re making multiple. For lack of a better term, I’ll call this process Franchising your Django project.

Why franchise your project?

When my organization began licensing our research-tracking software to multiple clients, each of them wanted to make minor adjustments to various aspects of our software suite.

Our initial response to this situation was to branch the original code into several different versions for each site. In this way, when a client wanted minor changes, we would make the modification to their respective code branch. This approach worked fine as long as we weren’t adding any new features to our software. The problem, however, was that we we were.

As more clients began licensing our software and as we continued to make improvements to it, the process of syncing those changes between all the different code branches became unmanageable. Despite heroic efforts, our svn-fu was not up to par.

Franchising was our answer to this tangled mess.

Defining “Franchise”

The goal was to develop a system capable of sharing common components between multiple sites from a single code-base while keeping it flexible enough for us to make minor changes to each specific site. The critical question that we had to answer was, “What did we need to keep in common between all sites, and what were the things we wanted to change from site to site?” The table below is our attempt at redefining the above question in terms of components common to a Django project.

Django Components in a Franchised Project
Globally-Shared Site-Specific
database x
models x
urls x x
views x x
settings x x
templates x x

From this table, three important pieces of information stand out. One, most Django components – the settings, views, urls, and templates – will have both a globally shared aspect and a site-specific aspect. Two, all sites, despite their differences in templates and views, will have a globally shared model schema. Why? because this allows us to develop our Django project as if we were running only one site. Finally, each site will have its own unique database since each site will have its own unique data.

With this in mind, a typical directory layout of a “franchised” Django project might look something like this:

Franchised Django Project – Directory Layout

In the top level directory sits all of the globally shared settings, applications, and urls. Under that, a “site_overloads” folder will contain all of the site specific components such as databases, urls, and templates.

Implementing a Franchised Django Site

Now the trickiest part about franchising your django project is getting the globally shared components and the site-specific components working seamlessly with each other. Inherent within this problem is the challenge of figuring out where to route users once they reach one of your sites.

Going along with our research-tracker example, imagine that a user has typed “http://research.nasa.gov/” into their browser. If it was possible to dynamically configure Django for a specific site’s settings before responding, we could essentially run many sites from a single codebase by switching between the different configurations. Luckily for us, Apache and most web-servers allow you to dynamically set environment variables depending on the url of the incoming request. Here’s how you do it within apache’s /etc/apache2/httpd.conf file

/etc/apache2/httpd.conf

<VirtualHost *:80>
        SetHandler python-program
        PythonHandler django.core.handlers.modpython
        SetEnv DJANGO_SETTINGS_MODULE settings
        SetEnv OVERLOAD_SITE nasa
        PythonPath "['/var/www/data/research_tracker'] + sys.path"
        ServerName research.nasa.com
</VirtualHost>

Using the SetEnv directive, we set a variable called “OVERLOAD_SITE” to “nasa”. Now, anytime someone requests a page from “http://research.nasa.com” Apache will automatically set the variable before handing off the request to Django.

By appending the following snippet to our top-level settings.py file, we can use the environment variable to dynamically load the desired site-specific settings.

globally-shared settings.py

import os
OVERLOAD_SITE = os.environ.get('OVERLOAD_SITE')
OVERLOAD_SITE_MODULE ="site_overloads" + "." + OVERLOAD_SITE
exec "from %s.settings import *" % (OVERLOAD_SITE_MODULE)

The os.environ.get() function in the above code extracts the value of our desired variable, “OVERLOAD_SITE”, from the operating environment. Using this value, it determines the correct import path to the site-specific settings. Finally, with the exec() function, it imports the site specific settings.

To reiterate our general approach:
1. Use Apache to pass off requests to Django, but before doing so, set an environment variable dependent upon the request url.
2. Read this environment variable from within the project’s globally shared settings.py and determine the desired site.
3. From within the globally shared settings.py, import the desired site-specific settings module.

Configuring the site-specific settings to override global settings

Django Components in a Franchised Project
Globally-Shared Site-Specific
database x
models x
urls x x
views x x
settings x x
templates x x

Earlier, we defined the boundaries between globally shared components and site-specific components in a franchised Django project(see above table). With our dynamic configuration setup in place, we can now force site-specific components such as the database, urls, views, and templates to override the default global components. The key to doing so is defining our intentions in the site-specific settings.py. From our NASA research-tracker example, that file would be site_overloads/nasa/settings.py

Setting The Site-Specific Database

By adding these few lines, we can specify to Django the exact database to use for our NASA site.

site_overloads/nasa/settings.py

# Overload Default database
DATABASE_NAME = "site_overloads/nasa/",'database.sqlite3'

Overriding Global Templates with Site-Specific Templates
To make sure that templates in the site_overloads directory override the globally shared templates, we have to modify both the globally-shared and site-specific settings.py files like so:

site_overloads/nasa/settings.py

# Overload Default TEMPLATE_DIRS
SITE_TEMPLATE_DIRS = [
    "site_overloads/nasa/templates",
]

globally-shared settings.py

# Prepend a list of site-specific template directories to the TEMPLATE_DIRS
if SITE_TEMPLATE_DIRS:
    TEMPLATE_DIRS = SITE_TEMPLATE_DIRS + TEMPLATE_DIRS

As a result of these settings, when rendering templates, files in the site_overloads/nasa/templates folder will always be chosen over files in the globally shared templates folder, even if they share the same name.

Adding Site-Specific URLs

In the globally-shared u{urls.py} add these few lines of code to dynamically import the correct site-specific urls:

urls.py

# import site specific urls
from django.conf import settings
if hasattr(settings, "OVERLOAD_SITE_MODULE"):
   exec "from %s import urls as site_urls" % (settings.OVERLOAD_SITE_MODULE)
   urlpatterns = site_urls + urlpatterns

With this in place you can now add url patterns as normal under your site-specific urls.py.

Testing the setup

After making all of those adjustments, you’ll most likely want to try out your new setup without having to startup Apache. A good way to test our solution is to manually set the environment variable before running “manage.py runserver” or “manage.py shell”. Type the following on the commandline to run a shell using the site-specific setup. Once inside the Django shell we can poke around at the various site-specific settings and verify that they are what we expect:

OVERLOAD_SITE=nasa; manage.py shell
> from django.conf import settings
> print( settings.OVERLOAD_SITE )
nasa

Summary

In this article, I outlined how to franchise your Django project. As our main approach, we used Apache’s ability to set environment variables to dynamically configure Django before responding to any requests. Because of this, we could dynamically route users to site-specific views, urls, templates, and databases depending on the incoming request url.

Franchising allows you to run many Django sites from a single code base. It’s ideal for those times when you want to reuse code that you have written for one site on another project. There is no need to maintain several branches of slightly different code, and as a result, improvements made on one site will simultaneously apply to all sites.