Determining who acted in two known films
Have you ever seen someone in a movie and thought to yourself “Hey, weren’t they in movie Y too?”
This happened to me last night when I was watching ”Casino”. The parking lot attendant from a certain scene looked just like a character from ”Fear and Loathing in Las Vegas”.
To the *nix shell, Robin!</Bruce Wayne Voice>
I wrote a short script to download the IMDb “profile” pages for all the actors in movie X and list which ones also acted in movie Y. This is, effectively, a set intersection operation.
I could’ve registered for IMDb’s 14-day trial of their $20/mo “Pro” service which allows advanced searches on their database, but I wanted a challenge.
Here are the important pieces of the script:
# grab (only) the profile pages for each actor who performed in title
# 'tt0112641' ('casino'). recurse, but only 1 level deep into links.
wget --recursive --level=1 --wait=1 --include_directories=/names
'http://www.imdb.com/title/tt0112641/fullcredits'
# determine if any of the retrieved profile pages contain a reference
# to 'fear and loathing'
find . -type f -print | xargs grep --files-with-matches
--ignore-case 'fear and loathing'
wget will, by default, obey a site’s robots.txt file. IMDb.com’s robots.txt says you’ll get fined 1 cent for every request to their server that causes a denial of service! So, be nice and only download from the crawl-able portions of their site (and do it slowly with `–wait`, otherwise you might get temporalily blocked with HTTP 500 errors).
The script gave me ~10 results most of which were off-screen roles e.g. “set designer”. One, however, was the dude I was looking for: Brian LeBaron!. He acted as a parking attendant in both Casino and Fear and Loathing.
Update: There are now much more elegant ways to interact with IMDb. See, for example, IMDbPY
Update: More retrospection: the “combine” utility in moreutils (via deb-a-day) might’ve been handy.




