Ripping SXSW 2008 mp3 files

I’ve grown impatient waiting for 2008.sxsw.com to release their torrent of mp3 files. I’m wondering if they’re going to do it at all.

So, I decided to just suck down the whole site and scrape out all the URLs to all the mp3 files and download them. It was very straightforward.

First, scrape the site by doing something like this:

wget -nd -nH -r –no-parent -nc http://2008.sxsw.com/music/showcases/alpha/0.html

Then, do something like:

grep mp3_download *.html

(Yes, they were silly enough to use a CSS class for all their mp3 download links named ‘mp3_download’.)

Then, you’ll have a file with a bunch of raw HTML links. Pull that into something like emacs and do some replace-regexp commands to trim it to just the URLs themselves. (There are 740 of them). I then took the resulting list of mp3s, split it into 2 files, and am running two copies of wget in parallel to suck them all down. Here’s a copy of the list of all 740 mp3 files.

Send me an e-mail to my private account if you’d like me to hook you up with a .tar.bz2 of all 740 files. I wonder if they’ll release that .torrent soon? :)

UPDATE: The download completed overnight, and the resultant files are about 3.4GB.

Tags: , , , , ,

9 Responses to “Ripping SXSW 2008 mp3 files”

  1. Anonymous Says:

    that’s a nifty bit of trickery there. i just downloaded the text file, reloaded that as a batch download in bitcomet and i’m all set. thanks a lot! i was beginning to doubt whether they’d put it up as well.

  2. Tim Says:

    One command after you’ve done the initial spider:

    grep mp3_download *.html | sed ’s/.*href=”\(.*\)”.*/\1/’ | xargs wget

  3. John Says:

    Thanks for the file list. I just finished a wget session to retrieve the list, but I came up with “only,” 523 tracks. Either the missing tracks have been removed, or my session timed out & skipped them. I know you don’t know me from Adam, but is there any way I could get a copy of the full tar.gz?

    Thanks,
    jw

  4. An Austin Cow Says:

    I suspect to see a torrent soon. I’ll post here when it’s ready.

  5. Bob O'Shaughnessy Says:

    There’s a torrent already up on the Pirate Bay.

  6. An Austin Cow Says:

    It’s here! http://hewgill.com/torrent/SXSW_2008_Showcasing_Artists-Release_1.torrent

  7. An Austin Cow Says:

    Yeah, I saw the pirate bay one. This one has more songs and isn’t hosted on a sight that could get you pinned with an HR violation if you view it at work.

  8. John Says:

    I just started the torrent from hewgill. Seems pretty good. Since I already had 520+ of the songs it went to about 55% done upon startup. :-)

  9. douglas Says:

    this one was not in the list

    http://audio.sxsw.com/2008/mp3/Air_Waves-Ernie_and_The_Sand.mp3

Leave a Reply