ThinkPad T520 Wireless Drivers for Ubuntu 11.04

Just got a new laptop, a ThinkPad T520, and I’m running Ubuntu 11.04 (Natty)

With no additional drivers, the wireless (RealTek 8188ce) would work somewhat intermittently.  Sometimes it was great, and other times it wouldn’t associate at all.

The correct fix is to connect to a wired network, then add a PPA for the RealTek compiled drivers, and add the drivers themselves, like this:

sudo add-apt-repository ppa:lexical/hwe-wireless
sudo apt-get update
sudo apt-get install rtl8192ce-dkms

After doing that, everything works great. I hope this works for you as well!

(Found this after Googling for quite a while.  Source: http://ubuntuforums.org/showthread.php?t=1580036&page=2)

Posted in General | 5 Comments

A quick lesson in how to subvert the democratic process.

Imagine that you wanted to subvert an election.  What do I mean by “subvert”?  I mean, that you wanted everyone involved in the process to think that it was a fair majority, but in fact, it was not.  Let’s go through some options on how to make this happen.

Here are some quick parameters for this thought experiment:

  • There will be exactly 9 people voting.
  • Each person gets to vote for either Option A, or Option B. These will simply be called “A” and “B”.
  • Each person must continue to believe that their vote was counted, and was contributing to a majority of votes.
  • 5 of the people will vote for “A”
  • 4 of the people will vote for “B”

Algorithm 1: Simple majority

Let’s just count all 9 votes together.

  • Count 5 people voting for “A”
  • Count 4 people voting for “B”
  • 5/9 = 55.5% of people voted for A.

We declare the winner to be “A”.

Algorithm 2: Representative majority.

Let’s divide the 9 people into 3 possible groups of 3 people each. Call these groups “P”, “Q”, and “R”.

The voting algorithm is as follows:

  • Each group of 3 voters cast their votes, and a majority winner of that group is declared.
  • Then, each group acts as a “Virtual voter” and a second election is held.  Each “virtual voter” (one for each Group) votes strictly according to the majority of the actual voters in that group.
  • Whichever Option gets at least a 2/3rd simple majority across virtual votes from P,Q & R is declared the winner.

There are a few different options for the outcomes of the sub-votes in P,Q, and R, and it’s worth showing them here:

Algorithm 2, Possibility 1:

People are divided into groups and votes are cast in each group as follows:

  • Group P: A, A, A  (Winner: A)
  • Group Q: A, A, B (Winner: A)
  • Group R: B, B, B (Winner: B)

“A” is declared the winner.

Algorithm 2, Possibility 2:

People are divided into groups and votes are cast in each group as follows:

  • Group P: A, A, B (Winner: A)
  • Group Q: A, A, B (Winner: A)
  • Group R: A, B, B (Winner: B)

“A” is declared the winner.

Algorithm 2, Option 3:

People are divided into groups and votes are cast in each group as follows:

  • Group P: A, A, A  (Winner: A)
  • Group Q: A, B, B  (Winner: B)
  • Group R: A, B, B  (Winner: B)

“B” is declared the winner.

Conclusion

As you can see, the Group selection process opens up a loophole in the voting, where 1 out of the 3 possible outcomes does not follow an actual voter majority.

If, prior to the group selection process, we can somehow know which voters are going to vote for A, and which for B, we can control or modify that process (which is likely opaque to the individual voters) in a way that influences the final outcome significantly in favor of B.

The election has been subverted, because each individual voter thinks “I voted, and in my group, a majority consensus was reached, and this directly contributed to the final majority outcome.”  Voters are lulled into thinking that their votes are “representative” of their group, and that the final outcome is “representative” of the majority of all votes.

This is how congressional redistricting and the electoral college voting process works.

Posted in General | 1 Comment

Simulated HDR look in The Gimp

I had these instructions written up in about 2002 (at least that’s the timestamp on the HTML) and I’ve since nuked that section of my site, so I’m replicating it here.  If you want that “simulated HDR” look in your photos, here’s an easy way to get it:

  1. Create 3 layer copies of your original photo.
  2. Set the top layer mode to “Saturation”
  3. Set the middle layer mode to “Overlay”
  4. Desaturate the middle layer
  5. Invert the middle layer
  6. Gaussian Blur the middle layer with a large radius blur (10-20% of your maximum image dimension is a good choice.)
  7. Duplicate the middle layer if you want “more effect”
  8. Adjust the middle layer opacity if you want “less effect”

This technique will bring up the levels of the dark parts while not blowing out the dim parts.  It’ll do it in a way that’s much more subtle than “levels.”   If you duplicate the millde layer 2-3 times (or more) you’ll get that funny high-contrast HDR-look from a single photo.

I’m not saying it’s the way you wan to your photos to look, but to each his own.

For example, you can easily go from this:

Original (click to enlarge)

to this “Simulated HDR” image:

Simulated HDR, contrast enhanced. (click to Enlarge)

Posted in General | Leave a comment

Passing of a namesake

My thoughts and prayers today are going out to family and friends of Googler and namesake Steve Lacey who passed away this weekend.  You’ll be sadly missed.

Posted in General | Leave a comment

New Google black navigation bar images.

I’ve taken a thin vertical slice of two Google properties below, and horizontally tiled it.  This accentuates how out of place the black bar looks in the Chrome world of blue & white.  I wonder how Google is going to address this branding inconsistency across it’s products.

Google Reader

 

Google Search

 

Posted in General | 1 Comment

Proposed scheme for per-user database field encryption.

I’ve been thinking a lot about hackers, stolenpasswords, rainbow tables, and credit card numbers in databases.

But, the question remains:  ”How should I store credit card numbers in a database for maximum user security?

Typically, user authentication for web sites looks like this:

  1. User types username & password into text box on a web page.
  2. username & password and sent (ideally, via https) to the web server.
  3. The server hashes the password with a known secret (salt) to generate a password hash.
  4. The server compares the password hash the same hash value stored in the database.  If successful, a random session ID is generated and sent back to the client as an HTTP header, as well as stored in a local store for referencing in future requests.

Key points:  Plain text passwords aren’t stored in the database.  Passwords are never sent unencrypted (if using https) and session ID’s are “unguessable” because they’re randomly chosen from a large space.

If your web application needs to store sensitive user information, like credit card numbers, how should you do it?  I propose using per-user encryption keys that are based on a salted hash of the plain-text password.

How do you compute the correct keys and, where should you store them?  One possibility is to use another secure hash of the user’s plain text password and some other salt value that’s not the same as the one used for password checking.   Let’s call the resultant value our user encryption key.  It should never be stored in persistent storage, but should be sent back to the user as an HTTP header.

The user, with every request, will then be sending to the server a random session ID as well as their current user encryption key value.  The user encryption key would be used with a symmetric encryption algorithm to store any sensitive database values.  (Home address, credit card numbers, other site credentials, etc.)  When the user changes their password, then the encrypted database fields need to be re-encrypted with their new values.

As the developer of the server software, you need to be extra careful to only store unencrypted sensitive information in RAM, and only use it for the minimum duration possible.   For example, it would be difficult to impossible to implement offline billing with such a system in place.

With this scheme in place, your customer’s data will be completely secure, even if your server is hacked and even if your entire database replicated.  Even if the hackers had access to your source code, they could, at most, get the unencrypted data for users who were currently visiting the site (through net sniffing the user encryption key values) or from users whose PCs have been compromised.   That feels pretty secure to me.

The resultant flow for user sign-in looks like this:

  1. Receive plaintext username and password
  2. Hash password with “password salt” and check against stored value for user authentication and session generation.
  3. Hash password with “user key salt” and set an HTTP header with this value as the user encryption key.
  4. When user reads or writes sensitive information, encrypt/decrypt it with the user encryption key.
  5. Never store decrypted values in persistent storage. (some asynchronous task queues may violate this constraint, so be careful)
Posted in General | 4 Comments

Django + PostgreSQL + virtualenv Development setup for Windows 7

Here’s what you need to do Django development on Windows 7. As I go through the install, I’m writing down all the steps to make sure that I don’t miss any. I’m going to focus on:

  • Python 2.7.1 from python.org
  • virtualenv (manages python packages and dependencies)
  • Visual Studio 2008 (for compiling Python addons)
  • PostgreSQL 8.4 (database engine)
  • Django 1.3 (our web framework)
  • Windows psycopg2 installer

It will be possible to install any other requirements (PIL, etc.) using pip after the virtualenv is set up.

Note on 32-bit versus 64-bit

Most modern computers these days are 64-bit capable, and will (usually) be running a 64-bit operating system.  In addition, they can also run older 32-bit binaries.  When you install all the components below, you must choose either

Install Visual C++ 2008 Express Edition with SP1

This is the compiler tool needed to build other python extensions we’ll be adding later. Download it from Microsoft.   Select “Visual C++ 2008 Express Edition with SP1″, choose your Language and click the “Free Download” button.

You must use the 2008 edition, since that’s the compiler that was used to build Python 2.7.1, and the compiler versions must match for the additions to be compatible.  If you already have a newer version of Visual Studio installed, please also install the 2008 edition I’ve linked to.

You do not need to install the Silverlight runtime or Microsoft SQL Server, unless you think you want these for other purposes.

This compiler is only 32-bit capable, so we’ll be sticking to 32-bit Python below.  This shouldn’t cause any issues for most development installs.  If you have an official purchased version of Visual C++ 2008 that’s 64-bit, then you’re on your own.

Install 32-bit Python 2.7.1 from python.org

Start on python.org and select “Python 2.7.1 Windows Installer”.  Please do not choose the x86-64 Installer, as it isn’t compatible with the compiler from above.

Download and install python-2.7.1.msi

  • Select “Install for all users”
  • Use the default location of “C:\Python27″ (note: no period)
  • Use the default options on “Customize Python”.

Install setuptools from python.org

setuptools is a Python package that facilitates installing other packages (and thus, bootstrapping your install system).  We’ll use setuptools to install other packages, but first we need to install it.  Get it from python.org.  I’m using “setuptools-0.6c11.win32-py2.7.exe”  Download and run that binary, and use the default options in the installer.

Install PostgreSQL 8.4

This will be your database server.  We use this version because it mirrors what we use in the production environment.  Select the 8.4.8-1 installer for Windows.

Run the installer and use the default install options. You’ll need to choose a password for the postgres user.  Choose something you won’t forget.

You do not need to run the “Stack Builder” tool.  Un-check that option and finish your install.

Install psycopg2 for windows

psycopg2 is the interface API from PostgreSQL to Python.  Unfortunately, it’s not packaged in a way that’s easy to install automatically, so you have to download and install it.  Choose the proper version for your Python (likely 2.7, 32-bit as we’ve discussed  before).  Here’s a link to all the available packages.

Install virtualenv

virtualenv is a Python tools that helps us manage and install Python packages in a neat and clean way.  It also helps the installation of these packages to other systems (like to our deployment environment, which is likely Linux). Run:

C:\>cd \Python27\Scripts
C:\Python27\Scripts>easy_install virtualenv

From this point forward, we won’t use easy_install anymore.

Create a development environment using virtualenv

The virtualenv tool will create an “environment” where you can install any needed Python packages, like Django.  This environment contains specially modified versions of Python and other tools that make dependency management much, much easier.  Choose a directory for your environment.  I like to put things in /Home/<username>/Desktop/src/<environment_name>  for easy access.

C:\Users\Steve Lacy\Desktop>mkdir src
C:\Users\Steve Lacy\Desktop>cd src
C:\Users\Steve Lacy\Desktop\src>mkdir test
C:\Users\Steve Lacy\Desktop\src>cd test
C:\Users\Steve Lacy\Desktop\src\test>\Python27\Scripts\virtualenv --no-site-packages --distribute env

This will create an environment in src\test\env and populate it with modified versions of Python, pip, etc.

You’ll need to “activate” this environment to start using it.  virtualenv puts in a simple activate script to help you do this, like this:

C:\Users\Steve Lacy\Desktop\src\test>env\Scripts\activate.bat
(env) C:\Users\Steve Lacy\Desktop\src\test>

Note how your prompt changed to say that you’re using development enviroment “env” which we named above.

Now that our environment is “active” we have easy access to the special versions of python and pip that have been placed in env/Scripts.  They’re now on your PATH, so you can use them directly.

Install Django using pip

This is fairly easy now that we’re in our environment and it’s active:

(env) C:\Users\Steve Lacy\Desktop\src\test>pip install django

Inspect the terminal output carefully to make sure that it successfully installs each of these. Once that’s complete, you should be good to go.

You can create a new Django project by running:

(env) C:\Users\Steve Lacy\Desktop\src\test>python env\Scripts\django-admin.py

From here, your best bet is to continue to the Django tutorial and introductions

Posted in General | Tagged , , , , , , | 2 Comments

Quick experiment: A QR-Code clock.

What will happen if you use your phone’s QR code scanner to try to scan a QR code that’s not static?  For example, what if you implemented a clock as a QR code?

Posted in General | Tagged , , , | 1 Comment

git pull says “You are not currently on a branch…”

Was working through some git error messages generated by pip installs of some Python code, and found that the issue was caused by this error:

$ git pull
You are not currently on a branch, so I cannot use any
'branch.<branchname>.merge' in your configuration file.
Please specify which remote branch you want to use on the command
line and try again (e.g. 'git pull <repository> <refspec>').
See git-pull(1) for details.

The solution (in this case, per the above directory) is to run:

$ cd /path/to/git/repository/from/above
$ git checkout master

I wish pip didn’t cause this problem. When you specfiy a git repository on the commandline, this should happen automatically. Maybe this has been fixed in pip 1.0 which was just released today.

Posted in General | Tagged , , , | 2 Comments

What Larry Page really needs to do to return Google to its startup roots

I worked at Google from 2005-2010, and saw the company go through many changes, and a huge increase in staff.  Most importantly, I saw the company go from a place where engineers were seen as violent disruptors and innovators, to a place where doing things “The Google Way” was king, and where thinking outside the box was discouraged and even chastised.  So, here’s a quick list of things I think Larry could do to bring the startup feel back to Google:

Let engineers do what they do best, and forget the rest.

This is probably the most important single point.  Engineers at Google spend way too much time fussing about with everything other than engineering and product design.  Focusing on shipping great, innovative products needs to be put before all else.  Here’s a quick rundown of engineering frustrations at Google when I left:

  • Compiling & fixing other people’s code. This is a huge problem for the C++ developers at Google.  They spend massive amounts of time compiling (and bug fixing) “the world” to make their project work.  This needs to end.  Put an end to source-code distributions for cross-team dependencies.  Make teams (bigtable, GFS, Stubby, Chubby, etc.) deliver binaries & headers in some reasonable format.
  • Machine Resource Requests for products in the “less than a petabyte” class. Just hand out the resources pro-bono, track usage, and if they exceed some very high limit, then start charging.  Why is this a struggle?
  • LCE & SRE “blockers”.  Having support for Launch Coordination & Site Reliability is great, but when these people say “you can’t launch unless…” then you know they’re being a hindrance, and not a help.
  • Meetings.  Seriously, people are drenched in “status update” and “team” meetings. If your company has to have “No meetings Thursday” then you’re doing it wrong. How about “No meetings except for Thursday”.  That would make for a productive engineering team, not the other way around.
  • Weekly Snippets, perf, etc. I was continually amazed by the amount of “extra cruft work” that goes on.  I know it sounds important, but engineers should be coding & designing.
  • Perf, Interviews & lengthly interview feedback. The old fashioned model of getting together in a room to discuss a candidate is way more efficient.  Make sure that every single engineer in the building is participating in the interview process to spread the load more evenly.  Don’t let the internal recruiters pick engineers for interviews, as they have favoritism and are improperly motivated.   Limit to 1 interview per week, maximum.   Make a simple system for “I can’t make this interview” and “I think this resume looks shitty and don’t want to talk to this candidate.”
  • Discourage of open source software. There is so much innovation going on in the open source world: Hadoop, MongoDB, Redis, Cassandra, memcached, Ruby on Rails, Django, Tornado (web framework), and many, many other products put Google infrastructure to shame when it comes to ease-of-use and product focus.  Engineers are discouraged from using these systems, to the point where they’re chastised for even thinking of using anything other than Bigtable/Spanner and GFS/Colossus for their products.

Get rid of the proprietary cluster management system.

Yes, seriously.  What they have is a glorified batch-scheduling system that makes modern datacenters feel like antiquated mainframes.  Dedicated machines and resources are what startups have, so give them to your best engineers, and they’ll do great things. You should have learned this from the teragoogle team.  Start building a better, Virtual Machine based system where engineers can own & manage machine images themselves, all the way down to the operating system, dependencies, etc.  If more structure is needed, use existing open source packages or develop new systems in house, and open source those.  Build new “non-standard” data centers that don’t use the old system, and that every engineer can use.

The cluster management system’s fatal flaw is that it requires too large of an ecosystem, and pidgeon-holes running jobs into a far too restrictive container.  It doesn’t allow persistent local disk storage, since jobs can be terminated and relocated at any time.  Services running there are then cajoled into using Bigtable and/or Colossus for their persistent storage, which rules out virtually all other external database systems (MySQL, etc.).   This is an antiquated and overly constrained model for job allocation.

Switch to team-based distributed source control.

Teams or large related teams should manage their own source code.  Provide git-based hosting, and nothing else.  Cross-team deliverables should be done at the binary release level, not at the source code level.  Hard Makefile-type dependencies between teams need to be abolished.

Be the Bazaar, not the Cathedral.

Rethink the “lots of redundant, unreliable hardware” mantra.

Having to launch a simple service in multiple datacenters around the world, and having to deal with near-weekly datacenter maintenance shutdowns is unacceptable for an agile startup.  Startups need to focus on product, not process and infrastructure.  One persistent Amazon EC2 instance is much more valuable than a 100 batch scheduled jobs in a cluster that goes down for maintenance every week. Stop doing that.

Eliminate NIH-syndrome

Google has a very, very strong NIH (Not Invented Here) syndrome.  Alternate solutions (Hadoop, MongoDB, Redis, Cassandra, MySQL, RabbitMQ, etc.) are all seen as technically inferior and poorly engineered systems.  Google needs to get off it’s high horse, and look at what’s happening outside of it’s organization.  Hugely scalable services like Twitter are built on almost entirely open source stack, and they’re doing it really efficiently.  Open source solutions have a product-focus that’s missing from much of Google’s infrastructure for infrastructure’s sake engineering endeavors.  Focusing on the product first, and using any available solution is the agile way to experiment in new spaces.

Additionally, by eliminating the NIH syndrome, Google needs to allow these open systems into it’s production environment.  Amazon and RackSpace have nailed this with reliable, virtual hosting solutions, and this is allowing services built on those platforms to be portable, efficient, and agile.

Remember that small, special-purpose is more agile than big, general-purpose.

Google is famously good at building huge pieces of infrastructure that solve big, important problems. GFS & Colossus for file storage, Bigtable, Blobstore and Spanner for structured data storage, Caffeine for document storage and indexing.

But, when faced with a new problem or new requirements, projects are expected to pidgeon-hole their needs into one of these systems, or be chastised for “doing it wrong”. Additionally, when your application needs inevitably don’t fit or grow out of existing infrastructure capabilities, requests for improvement or enhancement are lost in the noise. This means small teams are crippled by the lack of agility of these monstrous systems.

Google’s engineers need to think & act like startup founders. Only develop what’s absolutely necessary to get your job done. Simplicity counts. Complex systems are hard to learn, debug, and maintain. Keep it small and focused.

Implement an in-house incubator.

Do this right now.  When a current employee comes to you and submits their resignation letter, and says they’re joining a startup, you should immediately respond with “Oh! Well, let me tell you about our in-house startup incubator…”

Put smart people together in a room, let them think freely about products and infrastructure, and good things will come of it.  In fact, I might argue that every Staff level engineer or higher should “go on sabbatical” to the in-house incubator for a period of a minimum of 6 months.   Rotate people in & out, and let them bring their incubator learnings back on to the main campus.  Have one incubator per geography, at a minimum, possibly more.  Let people choose their best freinds/coworkers, and go off and do something great for 6 months.  No managers, no meetings, no supervision.

Make it very clear that good, small ideas matter.

This is so important.  One of the things I heard over and over was “If your product isn’t a billion-dollar idea, then it’s not worth Google’s time.”  This message sucks.  What you’re saying is “your great idea that might make millions per year is less important than a small tweak to ads or search”.  Even if it’s true, you need to foster innovation of much lesser initial impact.

Google acquisitions of companies in the $5-50mm range means that at some level, small businesses are valued.  Make this very clear.  It sucks to have someone say “your $5mm idea isn’t big enough” on the one hand, and then watch Google buy up companies for $5mm each. This is bad precedent.

Eliminate internal language and framework cronyism.

By this, I mean: “Stop forcing people to do things The Google Way”.   There were several times where I had seen “unGoogly” system desgins get shot down because they didn’t use Bigtable, GFS, Colossus, Spanner, MegaStore, BlobStore, or any of the other internal systems.

For example, languages like Python are shunned upon because they’re “too slow for web frontends”.  Let teams use whatever tools and languages they want, and are most efficient in. Don’t pass judgement on infrastructure, pass judgement on Products.  If someone launches a great system based on Oracle and a bunch of Perl CGI scripts running on Sun Sparc 5′s, then you should praise them. If they’re crushed under load, then praise them even more for their success.

Engineers at Google spend huge amounts of their time being forced to prematurely optimize their backend and frontend infrastructure.  Most of the time, this benefits no one, as small products never get big enough to need such heavyweight systems, and are bogged down with the cost of multiple redundancy, and by using poorly behaved internal APIs that don’t meet direct product needs.

Make a general purpose cloud for internal use.

Amazon EC2 is a better ecosystem for fast iteration and innovation than Google’s internal cluster management system.  EC2 gives me reliability, and an easy way to start and stop entire services, not just individual jobs.  Long-running processes and open source code are embraced by EC2, running a replicated sharded MongoDB instance on EC2 is almost a breeze.  Google should focus on making a system that works within the entire Open Source ecosystem.

Acknowledge that 20% time is a lie.

Virtually no one I knew in my entire career there had an effective use of 20% time.  There are stories about how some products are launched exclusively via 20% time, and I’ve seen people use their 20% time to effectively search for a new internal position, but for the vast majority of engineers, 20% time is a myth.

I think it’s a great idea, and it needs to be made effective.  1 day per week isn’t reasonable (you can’t get enough done in just one day and it’s hard to carry momentum).  1 week per month would be great, but doesn’t do justice to your “main” project.  Something needs to budge here, and engineers need to be encouraged to take large amounts of time exploring new ideas and new directions.  Really fostering internal tools and collaboration might be the right answer.  I’m not sure, maybe they should just give up on it and give everyone a 20% raise.  Oh wait, they did that already.

Repeat your mistakes.

Engineers learn by doing, and learn by making mistakes.  Having rules about system design puts unnecessary constraints on thinking and products.  Having internal lore around things like “Google will never let another thing like Orkut ever happen again” is blatantly wrong.  Orkut was (and still is) a huge success, period.  None of the infrastructure stuff matters.  Even recent mistakes (Wave, etc.) should be praised and engineers should be encouraged to repeat those mistakes.

“Google Scale” is a myth.

Yes, I said it.

Google Search (the product) requires vast resources.  Almost nothing else does, and yet is constrained and forced to run “at Google scale” when it’s completely unnecessary.

Giving engineers the freedom to think & design out of the box with respect to infrastructure and systems means you’ll be more efficient in the long run.  Providing reliable platforms and data centers means you’ll have less redundancy, and be more efficient.

Given that a single machine can easily have 64GB of RAM, 10TB of disk, and 8 CPUs, it’s amazing that any product launch needs more than just a couple of that class of machine.  Let engineers push the boundaries, make mistakes, and run on the edge.

A small system that falls down under load is a huge success

A large system that’s wasting resources and has only a few users is a huge failure.

Posted in General | Tagged , , , , , , | 123 Comments