Profiling for fun and profit (using gravatars).
I have been using sites which use gravatars for quite some time. I probably saw them first on blog posts, but more recently many websites such as stackoverflow, uservoice, userscripts, etc.. have all started using gravatars.
I assume you know what gravatars are, if not, please have a quick look at www.gravatar.com, and come back here afterward.
So, gravatars. My gravatar looks like this:
The url for that image is http://www.gravatar.com/avatar/28f68a836b57094162e2b56f4c5c73aa?s=96.
I signed up at gravatar.com, uploaded a picture, and that's what my gravatar is. You can do this too. If you do not, gravatar will give you an identicon:
Or a monsterid:
Or a different default image chosen by the site operator. In all cases, the image you see on your
user profile on stackoverflow is the same image you see on uservoice, or on a blog post, or
on libre.fm, etc. This works because every time you make a blog post, or sign up for some site,
you enter an email address. Most people use the same email address for all the
sites they visit. The site you visit turns your email address into a gravatar
url.
How you calculate the gravatar url from an email address is documented here. When I follow those instructions, I get this:
http://www.gravatar.com/avatar/28f68a836b57094162e2b56f4c5c73aa.jpg
Which, as you may have noticed is almost the same url I mentioned at the start of my post. Which
is the url stackoverflow puts on my profile page. It is also the same url used when I comment on
the musicbrainz blog, or the same url on my libre.fm profile page. And remember, most of these
sites will always publish that url, whether you have signed up at gravatar or not. Now isn't
that interesting?
Let's say I have a list of email addresses harvested from some place, and I want to figure
out if they're coders so I can send them targeted spam. Let's figure out what they call
themselves too, so I can address them as "Dear
It turns out that for stackoverflow, it is quite simple. Stackoverflow uses sequential userids, so it took me only a few minutes to generate urls to every single profile url on stackoverflow. I then ran a script to fetch all those urls (with a 1 second pause between fetches because I did not want to hammer their servers too much). I used a few separate servers simultaneously and had all stackoverflow profiles the next morning.
userscripts.org and uservoice.com also have gravatars and use sequential userids, so would be just as easy to fetch. And when you've harvested them once, it will be even easier to fetch only the new profiles every day to keep your gravatar database updated. A site which doesn't use sequential userids is a bit harder to harvest, but I think for most sites this shouldn't be much of a problem.
As a proof-of-concept, I needed something to compare my nice collection of stackoverflow gravatars with. So I also fetched the gravatars from everyone who commented on the issues at musicbrainz.uservoice.com and everyone who commented on blog posts at blog.musicbrainz.org. I am not going to publish any user profiles here, but from my very limited set of uservoice/blog harvesting I could match 27 musicbrainz people to their stackoverflow accounts. Most of these used more-or-less the same name or nick on both sites, so it would've been easy to link them anyway -- but that is their choice. Some do use different names, and may want to keep these identities separate, which obviously they have a right to.
So, how serious is all this?
The more sites start using gravatar, the more interesting information can be collected about their users. What if a torrentsite like thepiratebay uses gravatars? Or facebook? Or an adult site? A dating site? A site about dieting? A site about your weird furry roleplaying fetish? You may not want those identities linked. If you know how gravatar works, you can work around this by using a different email address for each identity. But if you don't, how would you know those sites are publishing information about you which can tie you to other sites? www.gravatar.com doesn't mention that, the sites using gravatar do not mention that. And remember, in almost all cases, there is no opt-out: the gravatar url will be published whether you actually have a gravatar or not.
So, to conclude. If you are running a site which uses gravatars, please allow users to turn them off, and more importantly, educate your users on the risks.
Thank you for reading :)
ps. Image by Jared, taken from http://www.flickr.com/photos/generated/323388124/