Installing pip dependencies without touching the ‘net.

@jacobian just tweeted:

A little shitty-wifi-inspired hack to make pip install not have to touch the ‘net at all: http://bit.ly/IRrRcn

Yeah, been there, done that (using PIP_DOWNLOAD_CACHE).  It’s a good idea, but pip itself has better support for doing this.  I learned this technique from the pip development team, specifically @carljm over IRC and some bugs.

“pip install –no-install” first

Use an “sdist cache” and not PIP_DOWNLOAD_CACHE.  An “sdist cache” caches the actual distributed files, not the “pip-ified” files from pypi.   Pick a directory to store these sdist files in.  From now on out, I’m going to assume you’re putting them in $SDIST_DIR, wherever you decide that should be.

If you’re adding a new dependency, and you want that dependency to be able to be installed later without touching the ‘net, you need to download it first, and then install it from that download.  For example, if I wanted to include Django, I’d do this:

pip install --no-install --no-input --use-mirrors -I --download=$SDIST_DIR django

Which will put a file named something like Django-1.4.tar.gz (note the nice filename!) into $SDIST_DIR.  You can then put $SDIST_DIR under version control.

“pip install –find-links” second

Then, you can install django (or any other dependency that you’ve previously downloaded) without touching the ‘net by executing:

pip install -I --find-links=file://$SDIST_DIR --no-index --index-url=file:///dev/null django

Use requirements.txt, but not like they taught you

Unfortunately, this technique breaks “pip install -r requirements.txt”.  (I don’t remember the exact details but I do remember it’s broken)  But, the format of requirements.txt is simple enough that you can basically say:

for dependency in $(cat requirements.txt); do
    pip install -I --find-links=file://$SDIST_DIR --no-index --index-url=file:///dev/null $dependency

Just put this into a shell script to make your life easier, which leads us to…

Wrap it all up into a collection of shell scripts

Now that you know the general technique, you’ll need to wrap these two up into a couple different shell scripts.  Here’s what I do (without source — but I’ll share soon).

./add_dependency.sh: Download a new single dependency, per the pip line above, and then immediately install it.  This leaves a file in $SDIST_DIR, but that’s good, because it reminds me (via source control) that I’m out of sync with what everyone else thinks the dependencies are.

./download_all_dependencies.sh: Run “pip freeze” and download every package currently installed into the current virtualenv.  This is good because often times “pip install foo” will download several dependencies, and the ./add_dependency.sh script above doesn’t properly handle those cases.  I think this is a bug in pip.

./install.sh: Take “requirements.txt” and process it line-by-line running the “install but don’t download” commandline from above.

 

Make your tests 7x faster in Django 1.4

In Django 1.4, the default password hasher has been switched to the extremely secure PBKDF2 algorithm.

But, each encrypt and decrypt using PBKDF2 can take a pretty long time (on my system, about 150ms for each hashing, which happens twice per unit test that I’m writing). For your test cases, (that create users and log them in & out, probably) this extra security is probably pointless, and runtime is paramount.

So, create a custom settings.py for your test cases, and set the PASSWORD_HASHERS setting to exclude PBKDF2. You can also use this technique to setup an inmemory sqlite2 database as your backend, which also speeds things up quite a bit. Here’s a snippet from my settings_test.p:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': ':memory:',
        'USER': '',                      # Not used with sqlite3.
        'PASSWORD': '',                  # Not used with sqlite3.
        'HOST': '',                      # Not used with sqlite3.
        'PORT': '',                      # Not used with sqlite3.
    }
}

PASSWORD_HASHERS = (
    # 'django.contrib.auth.hashers.PBKDF2PasswordHasher',
    # 'django.contrib.auth.hashers.PBKDF2SHA1PasswordHasher',
    # 'django.contrib.auth.hashers.BCryptPasswordHasher',
    'django.contrib.auth.hashers.SHA1PasswordHasher',
    'django.contrib.auth.hashers.MD5PasswordHasher',
    # 'django.contrib.auth.hashers.CryptPasswordHasher',
)

Note that you have to have the sha1 algorithm enabled to allow the django.contrib.auth tests work. You can use different settings for test either get this by something like:

DJANGO_SETTINGS_MODULE=’yourapp.settings_test’ django-admin.py test

(probably putting this in a Makefile or shell alias) or I’ve also seen people put this at the top of their default settings:

if 'test' in sys.argv:
    from settings_test import *

These 2 changes took my test run time from 15.5 seconds down to 2.0 seconds, an improvement of about 7x! Woot!

Array valued Form fields in Django.

So, you want to pass an array of values into a Form in Django.  It’s not exactly obvious what the right solution is. You could read up on MultiValueField (https://docs.djangoproject.com/en/dev/ref/forms/fields/#multivaluefield) or you could read about widgets.MultipleHiddenInput (https://docs.djangoproject.com/en/dev/ref/forms/widgets/#multiplehiddeninput) but you’ll realize that neither of these allows for custom validation of the individual entries.

Here’s a generic ArrayField that might be of use:

class ArrayField(forms.Field):

    def __init__(self, *args, **kwargs):
        self.base_type = kwargs.pop('base_type')
        self.widget = MultipleHiddenInput
        super(ArrayField, self).__init__(*args, **kwargs)

    def clean(self, value):
        for subvalue in value:
            self.base_type.validate(subvalue)

        return [self.base_type.clean(subvalue) for subvalue in value]

Here’s the code I’m using to unit test this puppy:

class TestCharArrayForm(forms.Form):
    multi_char = ArrayField(base_type=forms.CharField(max_length=3))


class TestArrayField(ExaTestCase):

    def test_array_good(self):
        query_dict = QueryDict('a=1', mutable=True)
        query_dict.setlist('multi_char', ('abc', 'def', 'ghi'))
        test_form = TestCharArrayForm(query_dict)
        self.assertTrue(test_form.is_valid())
        self.assertEqual(test_form.cleaned_data['multi_char'],
                         ['abc', 'def', 'ghi'])

    def test_array_invalid(self):
        query_dict = QueryDict('a=1', mutable=True)
        query_dict.setlist('multi_char', ('abcd' * 10, # too long
                                          'deff' * 10,
                                          '1234' * 10))
        test_form = TestCharArrayForm(query_dict)
        self.assertFalse(test_form.is_valid())

It would be very straightforward to add extra fields on ArrayField to check for number of items in the array or any other characteristics you want.

Adding an array of values to a Django form

It’s possible to have an array-valued field in a Django Form, it’s just really, really not clear how to do it.

Background: I’m writing a series of REST APIs using a Django backend, and I like to define the parameters for POST and PUT as Django Forms objects.  I’m never rendering my Forms as HTML, as they just define the API.

In some cases, I’d like to pass an array of values.  Let’s say, an array of string tags for a blog post in the POST method that creates a blog post.  The form for this API would look like this:

class CreateBlogForm(forms.Form):
    title = forms.CharField(max_length=2000)
    body = forms.CharField()
    tags = forms.CharField(widget=forms.MultipleHiddenInput)

Then, in my View code, I would write a snippet that looked like this:

    blog_form = CreateBlogForm(request.POST)
    if not blog_form.is_valid(): 
        raise Exception("Invalid form")
    blog_data = blog_form.cleaned_data 
    blog = Blog.objects.create(title=blog_data['title'], body=blog_data['body'])
    for tag in blog_data.tags: blog.add_tag(tag)

Note how I’m accessing the tags members as an array? Exactly what I wanted!

Jenkins workspace archiving breaks on symlinks.

Our Jenkins build was working great (archiving one workspace, and then untarring it into another using the Archive for Clone Workspace feature) and then one day it broke.

The error was in the second build job, and says:

	at hudson.model.Run.run(Run.java:1421)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:238)
Caused by: java.io.IOException: Failed to chmod /var/lib/jenkins/jobs/oswebsite_test/workspace/wsve/lib/python2.6/UserDict.py : Operation not permitted
	at hudson.FilePath._chmod(FilePath.java:1248)
	at hudson.FilePath.readFromTar(FilePath.java:1813)
	... 16 more

The issue is Jenkins Bug 13280 which basically says that the “Archive for Clone Workspace” feature is broken if your workspace contains symlinks.  Hopefully that bug will be fixed soon.

My workaround was to just set a “custom workspace directory” for the second job to be the workspace directory of the first job.  Not a clean solution, but it gets things done.

emacs, tramp, ido, dbus and avahi

Yeah, that’s quite a cast of characters isn’t it?

If you’re using emacs, and inside emacs you use ido-mode, then it is, by default, attempting to use “tramp completion” which, by default, is using dbus, which, by default, uses avahi (i.e. zeroconf) to browse your network for shares.

What this means is that if you use ido, and you’re on a “big network” (With lots of avahi/zeroconf/rendezvous hosts) then you’ll see a noticable slowdown in opening files. The solution is:

M-x customize-group ido

and turn off ido-enable-tramp-completion

My .emacs.d/init.el has this line in the custom-set-variables section:

'(ido-enable-tramp-completion nil)

Then, it won’t use tramp, and it won’t use dbus and it won’t use avahi and you’ll be able to swiftly open files again. Whew!

PyCon attendees: Do fun things while you’re here!

Hey, so, you’re in town for PyCon and you’re staying at or near the Santa Clara Convention Center.

Your first reaction is: Ugh, is this really as cool as Silicon Valley is?  The answer is NO!  Of course not! Santa Clara is kind of the armpit of the valley, so it’ll kind of be a shame if that’s all you see.

Get out of there and go visit someplace!  It’s fairly easy to get to Downtown Mountain View via the light rail that stops right there in front of the SCCC.  It’s a bit slow (45 minutes) but if you don’t have a car, it’s a cheap & easy option. Mountain View is much more representative of what Silicon Valley is really like.

Downtown Mountain View has some great restaurants, all within easy walking distance of the train station, and most are open late.  Here are my suggestions:

  • Xanh:  Great Vietnamese.  Even if you don’t know Vietnamese cuisine, you’ll still have a great meal with a cool atmosphere
  • Shabu Way: A japanese style “hotpot” restaurant.  You’ll get a huge plate of raw beef and a boiling pot to dip & cook it in.
  • Molly MaGee’s: A fun Irish-style pub.  Great beer selection and traditional pub fare.
  • Red Rock Coffee: The upstairs is a virtual incubator of early stage startups.  You’ll be sitting right next to all kinds of cool people.  Chat it up if you dare.
  • Sushi Tomi: The best sushi restaurant in Mountain View
  • Cascal: A really fun atmosphere spanish tapas restaurant
  • Kappo Nami Nami: A great Kyoto-style japanese restaurant.
  • Tied House: Another good local brewery.
  • Taqueria La Bamba or Los Charros:  Both are great taquerias (casual mexican restaurants) that have their own rabid followings.
  • Fiesta Del Mar Too: An awesome regular mexican restaurant.
These days, Mountain View is full of early stage and well-known startups, and if you listen carefully, you might find that you’re sitting right next to someone really, really cool.

If you have a car, and can drive a bit, the following things are worth a quick drive-by:

And if you have a few hours to kill (and a car), I’d recommend:

That’s my short list.  I’m sure there are a million other things to do, so feel free to comment and I’ll update the post with other suggestions.  See you at PyCon!

More thoughts on RESTfulness.

Here are some more brief thoughts on why extreme RESTful ness is a bad idea:

As soon as you’re debating whether an API endpoint should be a PUT, POST or PATCH, you’re wasting your time.

If you, as the API developer, can’t decide which method is appropriate for a given action, then you’re almost certainly designing your APIs in a way that’s making them difficult to use.  For example:

What HTTP method would you use for updating the password on a user account?

  1. Use HTTP PUT because it’s a “modification of the User object”  But technically PUT is supposed to be idempotent, so you have to send the full user record in the request, but users aren’t allowed to update some aspects of their data, like username, so you send these fields to be RESTful but just ignore the data in your backend.  To make this happen, you have to roundtrip the entire user record to the client, and use a PUT with an If-Modified-Since header to avoid race conditions.
  2. Use HTTP PATCH because “that’s the more restful way to do it now that we know about the new PATCH method” and you use a JSON encoding, which is nice for you as the developer, but hard for clients to issue (think: wget clients).
  3. Use HTTP POST because “it’s what the web has been doing for the last decade.”  It works, it’s simple, and well defined.  It’s fully supported by every HTTP client out there.  Everybody understands how to POST and what it means.

You tell me what the right answer is.

Striving for the perfect RESTful API is a fool’s errand.

There’s always a lot of talk on programmer’s blogs about RESTful APIs.  Of particular note are articles like this one, which talks about how Ruby on Rails is switching to a brand new HTTP method “PATCH”.

Here’s the big issue:

API design should have nothing to do with which transport you’re using.

and

HTTP is a transport layer.

Your API should (and probably will) work over at least two possible transports.  The first “transport” is local function calls inside your application itself.  The second transport is probably going to be JSON or XML over HTTP, and there are tons of other possible transports people might ask for or want to use, like Google Protobufs over WebSockets.

It needs to be as easy as possible for your clients to issue requests to your API.

Yes, it really does.  Make it easy.  Make it dead simple to use your API.  As soon as you start down the path of “use form-encoded fields and HTTP POST to create objects” and “use <some encoding TBD because PUT doesn’t dictate an encoding> and HTTP PUT to update objects” you’re doing it wrong.  Your clients don’t care about the transport (PUT vs. POST) they just want to get, create, update and delete objects, and they want to be able to easily issue those requests from whatever systems they already have.  In the case of HTTP, this will probably be wget, curl, in-browser JavaScript and backend server-side libraries.  Making your clients explicitly choose between the poorly named PUT & POST is just nonsense.  Making them shoe-horn an alternate encoding or even form-encoding for PUT requests is nonsense.  They don’t care at all.  In fact, they probably don’t even care what the URLs are.  They just want your API to be easy to call and to work reliably.

Here’s a perilous RESTy example:

Imagine your prototypical web-based chat application.  You’re going to need a way to say “get me new chat messages on a given channel.”  So, you come up with a URL like this:

/channel/<channel name>/updates

Your RESTy API design says “Use an HTTP GET and the If-Modified-Since header to ask for new messages since the timestamp of the newest chat item”.  This sounds good.  It sounds right. It sounds RESTful.

Okay, great, so you implement the whole thing in your backend.  You have your unit tests testing it, issuing requests to a test server via a great HTTP library. Awesome.  It all works.

You pass off the API to your frontend development team.   They say “hey, this looks great!”  A week and a half later, they come back to you and say:

I can’t figure out how to properly get chat updates!  Every time I use the API I get all the chat messages, not just the new ones!

Ah! I know exactly what the issue is!  You’re not properly setting the If-Modified-Since header!  So, you go over to their desk and sit down, and they pull up Chrome’s Developer Tools.

See, here’s where I’m making the request from JQuery to your API.  I’m asking for /channel/foobar/updates via $.ajax(…)  So, HOW THE HECK TO I SET a custom value for If-Modified-Since?

Reading the documentation, I believe this is possible via some combination of the ifModified flag to $.ajax(), and/or the beforeSend function, and the XHR setRequestHeader function. But, you’ll quickly start reading jQuery bug reports and Google Groups posts about why this approach might not work, and we haven’t even started talking about how the string for the If-Modified-Since header has to be formatted in a fairly particular way, so you’ll probably need a custom date formatting library.  Your coworker on the frontend team might say:

Hey, so why can’t I just pass a timestamp or better yet an object sequence number as a GET parameter  and just don’t use this If-Modified-Since header since even if I can get it working, it’s going to be half a dozen extra lines of code or a special utility function every time I call your API.

“But that’s not RESTful!”  You’ll shout!

And then you’ll realize the mistake you’ve made.  In your quest to fully exploit HTTP, you’ve made it pretty hard for your clients to actually call your API.

Changing the Jenkins-CI port on Ubuntu

So, you’ve installed Jenkins on Ubuntu (following the directions on their Wiki page) but you want it to listen on a different port (the default is 8080).  I’m really, really surprised that there’s not a config file for this.  You have to edit /etc/init.d/jenkins.

Add the following two lines at the top of the file after DAEMON_ARGS:

HTTP_PORT=8888
JENKINS_ARGS="--httpPort=$HTTP_PORT"