Determining who acted in two known films


Have you ever seen someone in a movie and thought to yourself “Hey, weren’t they in movie Y too?”

This happened to me last night when I was watching ”Casino”. The parking lot attendant from a certain scene looked just like a character from ”Fear and Loathing in Las Vegas”.

To the *nix shell, Robin!</Bruce Wayne Voice>

I wrote a short script to download the IMDb “profile” pages for all the actors in movie X and list which ones also acted in movie Y. This is, effectively, a set intersection operation.

I could’ve registered for IMDb’s 14-day trial of their $20/mo “Pro” service which allows advanced searches on their database, but I wanted a challenge.

Here are the important pieces of the script:


# grab (only) the profile pages for each actor who performed in title
# 'tt0112641' ('casino'). recurse, but only 1 level deep into links.
wget --recursive --level=1 --wait=1 --include_directories=/names
'http://www.imdb.com/title/tt0112641/fullcredits'

# determine if any of the retrieved profile pages contain a reference
# to 'fear and loathing'
find . -type f -print | xargs grep --files-with-matches
--ignore-case 'fear and loathing'


wget will, by default, obey a site’s robots.txt file. IMDb.com’s robots.txt says you’ll get fined 1 cent for every request to their server that causes a denial of service! So, be nice and only download from the crawl-able portions of their site (and do it slowly with `–wait`, otherwise you might get temporalily blocked with HTTP 500 errors).

The script gave me ~10 results most of which were off-screen roles e.g. “set designer”. One, however, was the dude I was looking for: Brian LeBaron!. He acted as a parking attendant in both Casino and Fear and Loathing.

Update: There are now much more elegant ways to interact with IMDb. See, for example, IMDbPY

Update: More retrospection: the “combine” utility in moreutils (via deb-a-day) might’ve been handy.


Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Be the first to leave a comment!