How does Release Date Affect Oscars Best Picture Wins?
If you're just here for the answer to the title question, it is: Quite a bit. About 25% of the winners were released in December, and over half were released in the 3 months at the end of the year.
I used the enjoyable and optimistic-sounding Cheerio npm module (DOM traversal/selection a la jQuery, but in node.js) to scrape , and followed along with Mike Bostock's gentle d3 bar chart tutorial to put the data together in jsbin.
Here are the results:
I was hoping to do a little more explanation of how this was generated, but I'll save that for a future post. The code is pretty quick-and-dirty, anyway. If you'd like to do something with the cleaned-up data, the movies.json file is on github.
What did I learn? Well, not a whole lot. My initial expectation was that the movies were released toward the end of the year. In retrospect, even though it seemed official, the Oscars.org best picture winners page was probably not the best one to use. I should have been clued in by the raw php code in the source HTML there. It seemed marginally easier to parse than Wikipedia, though, so I went with it, but the extra time spent cleaning up data by hand offset any gains.
I was a bit surprised to realize how complicated release dates can be. The same movie gets released multiple times in the same country, and also in other countries, and also sometimes gets re-released. I had to put together some simple heuristics to choose the most appropriate date from lists like this one for The Sting.
Not-too-surprising fact: Lots of movies sneak in a limited release somewhere late in December but aren't available widely in the US until the following year (like Million Dollar Baby, which had a limited release in mid-December and the wide release about 7 weeks later).
Most surprising thing: Casablanca appears to be the only film listed to win for a year other than the one in which it was released. According to IMDB it was released in 1942 yet somehow won the 16th Academy Awards in 1943. Perhaps the notion of 'release date' is even looser than I realized.