Mourning the Open API

by Matt Cholick

A few weeks ago, I got an email from Rotten Tomatoes letting me know that their API is going private. I should "re-apply via the Business Proposal Form" to get continued access. This is actually the third and final major API to close that I used to build my master's project several years back. That software, or something like it, would be impossible to build today.

For the project, I built a collaborative recommender system for movies based on tweets. The system had two large components. First, I built a classifier to decide if a tweet was positive or negative. To build the training set, I attached to Twitter's firehose and searched for tweets containing expressions like :) or :(, using them as a noisy label. Today, the firehose now requires special permission; developers can no longer just start exploring this data or building something.

Once I'd built a classifier, I needed a collection of accounts that had tweeted about several movies. Tospy was my source here. Even then, Twitter didn't offer historical access. Topys provided a freemium API and search that let me build a dataset about older movies, which I needed to build a large enough collection of different items for recommendation. Topsy was purchased by Apple in 2013 and shut down in 2015. Today, there is no free source of historic tweets.

To build an informational page for my recommendations (links to reviews, poster art, and other information), I used Rotten Tomatoes. This let me put together a page for each movie without manual data entry. This API is now private.

Finally, as a new user entered the system, I read their entire timeline to find tweets about movies. This is what let me calculate similar users for collaborative recommendation. I also read the entire timelines of users surfaced by Topsy in an effort to build a larger dataset. Twitter's API changes in 2012 would have made this part much harder (specifically the rate-limiting). I likely wouldn't have been able to get sufficient data in time, as I ran data collection for weeks at the higher rate-limit to build my recommender.

Running through this list, I'm reminded of Anil Dash's The Web We Lost. He builds a fantastic parallel between privately owned public spaces and technology platforms. There's a lot to that topic, which is worth visiting, but it's tangential to this discussion. More to my point, he talks extensively about the drive toward a consolidation of a diverse ecosystem into a few massive, non-interoperable giants that view their platforms as a walled garden. He also contrasts Flickr and Instagram. The former cares about metadata, and that is what makes so many things possible.

I really can see a stark contrast between Flickr and Instagram. Built years apart, the former embraces concepts like metadata, creative commons licensing, an API, and all the things that it possible to pull its photos and make them a part of something else. I even found a 2007 book dedicated to Flickr mashups. In contrast, Instagram requires pre-approval of apps. It took years for Instagram to come to the web from mobile and years more before even basic things like web search were in place. Instagram's content is locked away, reflecting the walled garden the app was born in.

I miss the perspective of "Here's access to something that's uniquely our users' via an API; go build something we can't imagine." I hope that isn't a luxury that disappears as soon as a stock is public or growth slows down. Platforms need to make money; as a developer, Twitter and Rotten Tomatoes don't owe me anything. But... they do owe their users. These platforms are stewards and aggregators. Locking away this information does deprive their community. Whether it's something as silly as Klouchebag or something more profound, like Chicago tracking food poisoning, the web is a better place when we share.

It's sad to see all this interesting data disappearing behind walls.