seven1m.sdf.org

Download Your Twitter Archive and Images Too

written May 2018

I recently exited Twitter, but I wanted to take my witty tweets with me. Fortunately, Twitter allows one to download all their tweets in the form of some HTML, CSS, and JavaScript in a Zip file.

Now, setting aside my concerns about the longevity of the circa-2018 single-page JavaScript app Twitter provides, I was even more surprised to find that none of the media (images, videos, etc.) were inside the archive—the images are hotloaded from Twitter's servers! Facepalm.

I'm not saying my old tweets are proverbs of wisdom or my memes worthy of the Smithsonian, but I prefer not to worry about Twitter's investors passing down my memories to the next generation. (I'll leave that up to Dropbox, if my kids can find my password and figure out the 2FA.)

So I wrote a little Ruby script to download the media files from the web and replace the references in the HTML with local links. Here is the code:

The Code

require 'http'
require 'fileutils'
require 'digest'

FileUtils.mkdir_p('media')

paths = Dir['data/**/*.js'].to_a + ['index.html']

paths.each_with_index do |path, index|
  puts "#{index + 1} of #{paths.size}"
  data = File.read(path)
  data.gsub!(/"(http[^"]+)(\.(ico|png|gif|jpg|jpeg|mov|mp4|mpg|mpeg))"/i) do
    print '.'
    ext = Regexp.last_match[2]
    url = Regexp.last_match[1].gsub(%r{\\/}, '/')
    name = Digest::MD5.hexdigest(url) + ext
    asset_path = 'media/' + name
    unless File.exist?(asset_path)
      begin
        raw = HTTP.get(url + ext).to_s
        File.write(asset_path, raw)
      rescue HTTP::ConnectionError
        puts url + ext + ' could not be downloaded'
        next
      end
    end
    '"' + asset_path + '"'
  end
  File.write(path, data)
  puts
end

Usage

  1. First, you'll need Ruby. I believe this will work on any Ruby version in the 2.x range, but I only tested with Ruby 2.3 because that's what I had laying around.
  2. Paste the Ruby code above in a file and call it archive.rb.
  3. Run the following to install the 'http' gem. I could have used net/http but I was lazy.
    gem install http
  4. Now unzip your Twitter archive and run the script:
    unzip archive.zip -d archive
    cd archive
    ruby path/to/archive.rb

And that's it! Let it run, and at the end you should have slightly less disk space and slightly more peace of mind!


Hosting for this site is provided by

The SDF Public Access UNIX System