written May 2018, updated May 2021
Note: I got back on Twitter in 2020.
I recently exited Twitter, but I wanted to take my witty tweets with me. Fortunately, Twitter allows one to download all their tweets in the form of some HTML, CSS, and JavaScript in a Zip file.
Now, setting aside my concerns about the longevity of the circa-2018 single-page JavaScript app Twitter provides, I was even more surprised to find that none of the media (images, videos, etc.) were inside the archive—the images are hotloaded from Twitter's servers! Facepalm.
I'm not saying my old tweets are proverbs of wisdom or my memes worthy of the Smithsonian, but I prefer not to worry about Twitter's investors passing down my memories to the next generation. (I'll leave that up to Dropbox, if my kids can find my password and figure out the 2FA.)
So I wrote a little Ruby script to download the media files from the web and replace the references in the HTML with local links. Here is the code:
require 'http' require 'fileutils' require 'digest' FileUtils.mkdir_p('media') paths = Dir['data/**/*.js'].to_a + ['index.html'] paths.each_with_index do |path, index| puts "#{index + 1} of #{paths.size}" data = File.read(path) data.gsub!(/"(http[^"]+)(\.(ico|png|gif|jpg|jpeg|mov|mp4|mpg|mpeg))"/i) do print '.' ext = Regexp.last_match[2] url = Regexp.last_match[1].gsub(%r{\\/}, '/') name = Digest::MD5.hexdigest(url) + ext asset_path = 'media/' + name unless File.exist?(asset_path) begin raw = HTTP.get(url + ext).to_s File.write(asset_path, raw) rescue HTTP::ConnectionError puts url + ext + ' could not be downloaded' next end end '"' + asset_path + '"' end File.write(path, data) puts end
archive.rb
.
gem install http
unzip archive.zip -d archive cd archive ruby path/to/archive.rb
And that's it! Let it run, and at the end you should have slightly less disk space and slightly more peace of mind!
Hosting for this site is provided by