Holy crap locales!

Here’s something fun to try.  Create a text file that looks like this (Note: utf-8 encoded!):


Just to be really clear, here are the exact bytes I’m talking about:

$ hexdump -C  /tmp/foo.txt 
00000000  41 0a c3 9f 0a 43 0a c3  9f 61 0a c3 9f 7a 0a 61  |A....C...a...z.a|
00000010  0a 42 0a 62 0a 63 0a 53  0a 73 0a 53 53 0a 73 73  |.B.b.c.S.s.SS.ss|
00000020  0a 53 41 0a 73 61 0a 53  5a 0a 73 7a 0a           |.SA.sa.SZ.sz.|
$ md5sum /tmp/foo.txt
ac2be5e453dd79c070da74d0e67aa6b2 /tmp/foo.txt

Now, compare the output of the following commands:

$ sort /tmp/foo.txt
$ LC_ALL='en_US' sort /tmp/foo.txt
$ LC_ALL='en_US.utf8' sort /tmp/foo.txt
$ LC_ALL='en_US.iso88591' sort /tmp/foo.txt
$ LC_ALL='C' sort /tmp/foo.txt
$ LC_ALL='de_DE.utf8' sort /tmp/foo.txt

How’s that for rocking your world? So, the next time your friend says “hey, can you return those results sorted for me?” then you’ll have something really fun to think about when you can’t sleep at night.

And just when you thought “Oh, well great, at least all the UTF-8 versions sort the same” then comes along this little gem:

$ LC_ALL="jp_JP.utf8" sort /tmp/foo.txt

Oh, and just when you thought “Well, I guess I’ll be OK with en_US.utf8 and at least English will sort the way I want worldwide!” then along comes your friends to the North with this awesome zinger:

$ LC_ALL="en_CA.utf8" sort /tmp/foo./txt

Programming challenge: Semi-sort a list of random numbers.

Here’s a programming challenge / interview question that I like to think about, and gives me that tingly feeling of “I think there’s a really clever, efficient algorithm for this” but I haven’t been able to come up with a really clever answer yet.  Here’s the problem:

Given a file containing N random positive integers less than N, write a program that runs on O(n) time and produces a collection of sorted files containing the input data, but where each file is itself in strictly sorted order.

Give it a try and send me the code and we can compare algorithms.  I’ve been working with N=10000 and have a solution that produces about 200 unique sorted files, and can get as good as about 160 unique sorted files if I allow a fixed constant sized space usage (i.e. a small internal buffer).

I like to think of this operation as “semi-sorting”.  The output is a collection of sorted files, which can be merged together by a traditional merge operation.

Real-world Python deployment using pip & virtualenv. (Outline)

Real-world Python deployment. (Outline/notes)


You’ve got a great development setup and now you want to “do the right thing” in production.

You’re using virtualenv (good!)

You’re using “pip install…” for all your dependencies (good!)

You’re probably not keeping a requirements.txt up to date (that’s OK!)

You’re using “django-admin.py runserver” or similar (not gonna cut it!)

You’ve got all your source code in a git repo (self hosted or github or other, good!)

You’re ready to write your first fabric script! (good!)

Now let’s get that code out to production!


  1. Deploy a git repository to production.
  2. Let’s not use eggs for our own source.
    1. This is a debatable point, but for specific use cases like deploying a Django application, building and maintaining eggs is harder than it should be (mainly because of static resources)
    2. Deploying from a source directory is actually more straightforward.
    3. We’ll still use “python ./setup.py install” for our own code.
  3. Use virtualenv for environment management.
  4. Use pip to install dependencies.
  5. Reproducible deployment.
  6. No external network dependencies.
  7. Fast-ish deployment & dependency installation.
  8. Match development & production environments as closely as possible.


  1. Think about security for just a couple seconds.
    1. ssh keys in production?
    2. Could an attacker gain access to your git repository?
  2. “pip install” is a heavyweight process.
    1. Goes out to pypi.python.org and fetches metadata.
    2. Fetches each package from it’s own home hosting provider.
    3. These hosts go down.  Do you want YOUR deployment to depend on their servers being up?
  3. Cut your external network access and see what happens.
    1. Imagine if outbound network access from production was disallowed.  Could you still deploy?
  4. How do I rollback?
  5. How do I manage my system configuration
    1. apache configs
    2. nginx configs
    3. gunicorn
    4. crontabs
    5. init scripts (start/stop, etc)
    6. supervisord configs


  1. Security & key management
    1. Never ssh from production to anywhere.  Only ssh into production.
    2. Production machines should never have private keypairs.  (authorized_keys is OK)
  2.  git access in production
    1. Use the “push-pull” strategy.
    2. Your development machine does “git push” into a bare git repo in production
    3. Production machine then turns around and does “git pull” from it’s own local repository.
    4. Or, build some eggs. (This has its own issues that I won’t cover here)
    5. You can modify code in production, and commit it, but it won’t make it back to your repository unless you “git pull” from that repo.  This is a good thing.  Manage production customization in a reproducible way.
  3. users
    1. don’t deploy as root
  4. virtualanv & pip
    1. virtualenv is great!
    2. don’t rely on system packages!
    3. Make a new virtualenv every time you deploy!
    4. never run “sudo virtualenv…”
    5. never run “sudo pip…”
    6. PIP_DOWNLOAD_CACHE is NOT your friend.
      1. why with examples
    7. Solution: Separate “install this package” from “download this package”
      1. Step 1: pip install –no-install –use-mirrors -I –download=$CACHE_DIR …
      2. Step 2: pip install –no-index –index-url=file:///dev/null …
    8. Use some helper scripts to make this easier. (github link TBD)
    9. You’re still screwed sometimes.
      1. outline when TBD
    10. Automatic dependency downloads can still bite you
    11. Periodically re-download everything
      1. This makes sure that if dependencies change
    12. Managing package upgrades.
      1. When you want to upgrade, re-download the package and you’ll pick up the latest version.
      2. modify requirements.txt
  5. Managing your system configurations
    1. In principle: Make a “mock /etc” in your repo
    2. Copy “mock /etc” on top of the “system /etc” to install. (using fabric)
    3. A couple other commands to enable system services (a2ensite, and friends, etc.)
    4. supervisord, but it’s outside the scope of this talk
      1. system supervisord or a self-installed supervisord?
      2. How to start up supervisord?
      3. web interface to production or not?

Apple vs. Samsung: The cost of Android fragmentation.

I’ve commented in the past about how Android fragmentation isn’t really as huge of an issue as some developers have made it to be.  But, there is one elephant in the room that no one is talking about:

Android fragmentation and lack of updates increased damages in the Apple vs. Samsung lawsuit.

How?  Why?  Well, the sad truth of it is that Android is evolving over time to be “less infringing” on Apple patents.  One great example of this is the “bounce-back scrolling feature” patent.  This feature did not exist in Android 1.x, was implemented (poorly, I might add) in Android 2.x, and was then removed in Android 3.x and 4.x and replaced with a non-infringing color-overlay scrolling feedback mechanism.

So, if Samsung (or any other vendor) had been able to keep devices up to date more quickly, they would have been less liable to Apple for damages.  Similarly, if they had been quicker to adopt new Android versions (say, 4.0) then they would not be liable for any damages against this patent.

There are many other examples of how Google is evolving the Android User Interface to NOT infringe on Apple patents .  This is the first case that I can think of where fragmentation (more specifically: Lack of keeping software versions up-to-date) has cost anyone real money.  By the books, it’s Samsung who’s paying, and maybe this means they’ll be more aggressive with keeping up to date.

I also hope Google sees this lesson, and helps the hardware manufacturers with hardware drivers and other issues that hold back software on many devices.

Amazed at how many different Mars gallery interfaces there are.

Here, have this pile of links:

I’m going to keep updating this list as I find more, so check back.  I’ve already added 3 new links since I first wrote this.

Supercharge your bash prompt with git status goodness.

Here’s a thought:

Wouldn’t it be awesome if your bash prompt could show you:

  • Your current working directory.
  • Which git repository you’re currently in.
  • Which git branch you’re currently on (if not master).
  • How many outstanding files you have (files that need to be added or committed).
  • How many changes ahead (or behind) origin/HEAD you currently are.
  • Your current virtualenv (for Python development, but doesn’t hurt other languages)

Well, all this is possible (and more, probably!).  I worked a bit on getting all these features working this afternoon.  The source code is pretty rough, but I think this could be useful enough for others that I should start to share it.  I’ll likely put this in it’s own github repository eventually.  But, for now, here’s a simple gist with my ~/.bash_prompt source.

To use this, just copy it to your home directory, and add the following to the bottom of your ~/.bashrc:

source ~/.bash_prompt

Adding custom launchers to Gnome3’s Favorites.

This is totally non-obvious, so here goes.

At a shell prompt, run:

$ gnome-desktop-item-edit ~/.local/share/applications/mylauncher.desktop –create-new

Go through the dialog to create the launcher and make sure you give it an easy to remember name.  When you’re done, that application should show up under “Applications” in that search thing.

This site is solar powered!