Edited on May 29 : After writing this article, Christian Heimes pointed out to me that the debug method on unttest.TestCase allows you to run tests interactively. The idiot in me didn’t do enough research before writing this rant. I’m leaving this post up to provide some context for future visitors/googlers who are looking for a quick way debug their test cases interactively.
One lesson I’ve learned over many years of observation and programming in Python is that the easier I make writing and running tests for my programs, the faster I can produce bug-free software. That is why I start most projects nowadays with two pre-baked files: main.py and tests.py.
With this setup and tools like Nose, testing code in Python for me is nearly pain free. Recently however, I notice one aspect of the unittest package in Python’s standard libary drives me absolutely crazy:
Why is there no simple method to run a single test case from within an interactive python shell?
As developers, this is important because if we are going to make writing tests an integral part of
our development workflow, one needs some way to actually run the damn test without dropping out of
our Python sessions and breaking out of our mental flow.
Being forced to run tests solely from the commandline also means we can’t take advantage of features like ipython’s ability to automatically drop into a pdb debugging session when an error occurs. This is extremely useful for when you want to introspect one or two variables to determine why a test failure occurred.
In general, it seems like the unittest library’s lack of a simple function to run single test cases in an interactive shell dis-incentivizes programmers from writing tests and detracts from python’s whole “rapid iteration” ethos.
After googling around and failing to find any good alternatives, I’ve finally settled on writing my own utility function to run single test cases:
def runtest(testCase, methodName):
""" Runs a test case from within interactive shell """
tc = testCase(methodName)
getattr(tc, "setUp", lambda:None)()
try:
getattr(tc, methodName)()
finally:
getattr(tc, "tearDown", lambda:None)()
I add this function to every single project that I create now, and the qualitative feel of writing tests just feels so much more natural now. I also find myself writing more tests and running them more often with this function in my projects.
Give it a try, and let me know whether this has any effect for you. I’d love to hear your feedback on how it affects your development workflow.
Excel has always been my go-to tool for exploring the abundance of csv-formatted data that I often find on the internet. However, every once in awhile large datasets like these H1B salary figures show up1, where the shear amount of records exceeds Excel’s 65k row limit.
In these scenarios I turn to sqlite and a recently discovered project, apsw for my data analysis-fu.
Importing data into sqlite3
To start, I load up the data using a script I created a while back called csv2sql.py 2. Although you can use sqlite to directly load in csv files, I often find the functionality chokes on slightly malformed csv files. So here’s my short script that reinvents the wheel a little better using Python:
Usage: csv2sql CSVFILE
Options:
-h, --help show this help message and exit
-o DBNAME output sqlite3 database file
-t TABLE default table name for data import
#!/usr/bin/env python
import sys
import csv
import re
import codecs
import optparse as op
import sqlite3 as db
from itertools import islice
parser = op.OptionParser()
parser.usage = "%prog CSVFILE"
parser.add_option('-o', action="store", dest="dbname",default="data.db", help="output sqlite3 database file")
parser.add_option('-t', action="store", dest="table", default="Records", help="default table name for data import")
def main(fpath, table, dbname):
with codecs.open(fpath, encoding='utf-8', errors='ignore') as fh:
rows = csv.reader(fh)
fields = next(rows)
slugify = lambda s: re.sub(r'\W+',"_", s).upper()
schema = "\n\t\t" + ",\n\t\t".join("%s text" % k for k in map(slugify,fields))
print("Creating table: %s\n" % dbname)
conn = db.connect(dbname)
c = conn.cursor()
# create table with schema
stmt = "CREATE TABLE %s (%s);" % (table, schema)
c.execute(stmt)
print("Schema")
print("======")
print("\n%s\n" % stmt)
# insert
print("Adding records to database")
stmt = "INSERT INTO %s (%s) VALUES (%s);"
stmt = stmt % (table, ", ".join(fields), ", ".join(['?'] * len(fields)))
to_add = list(islice(rows, 2000))
while to_add:
sys.stdout.write('.')
c.executemany(stmt, to_add)
to_add = list(islice(rows, 2000))
conn.commit()
if __name__ == '__main__':
opts, args = parser.parse_args()
main(fpath=args[0], table=opts.table, dbname=opts.dbname)
Exploring the data with apsw

After importing the csv file, I typically run SQL queries against the database in a sqlite shell to explore the data. Apsw is a Python wrapper for sqlite3, which also includes an enhanced shell with features like tab completions and output modes for json and Python tuples. These two things combined make interactive data exploration extremely pleasant.
Once you have apsw installed 3, create a short alias in your .bashrc file so that you can invoke the enhanced sqlite shell from the commandline:
alias sqlite='python -c "import apsw;apsw.main()"'
With this in place, you can invoke the shell by typing sqlite data.db on your terminal prompt. Those familiar with the sqlite shell know that you can have it output the results of a query in several different formats (i.e. column, csv, line etc) . And to set the output mode you simply type .mode <MODENAME> into the prompt before executing your queries.
The two most useful output formats that apsw provides is the “json” and “python” modes. Here’s what the following SQL query looks like after setting up “json” and “python” modes respectively.
SELECT job_title, wage_from, avg(wage_from)
FROM main
WHERE job_title like "%SOFTWARE%" COLLATE NOCASE
GROUP BY job_title
ORDER by wage_from
LIMIT 10;


As you can see in the above example, this data is readily consumable in any standard python shell. Cut and paste the output into a python interpreter for further analysis, and use matplotlib to further visualize the data.
What Python’s MatPlotlib allows for in flexibility, it majorly lacks in aesthetics. Compared to graphs produced using ggplot2, the graphs I make using matplotlib strain my eyes. For the most part, this is caused by its default color scheme which the creator, John Hunter, optimized for display in publications rather than the web.
I spent some time over the weekend to refine the default colors and settings for my matplotlib installation. The result of this work is embodied in this .matplotlibrc color theme file. If you want graphs that look like the ones below by default, download it and place the file under ~/.matplotlib/matplotlibrc.




