Stripping Search

In response to regulatory pressure and to apply some pressure on their competition, Yahoo has announced that after 90 days it will anonymize search queries and remove personally identifiable information (PII) from them as well.  Specifically, Yahoo will delete the last eight bits from the IP address associate with a search.  Further, Yahoo will remove some PII data, like names, phone numbers and Social Security numbers from the searches.  The goal is to (eventually) destroy the ties between a person and what that person searches for which could include embarrassing, compromising, or sensitive items such as information about medical conditions, political opposition materials, adult entertainment, etc.

There are two points I want to draw you attention to.  The first point is related to the amount of time search providers, like Yahoo, hold identifiable search queries.  Regulators have recommended to search vendors to reduce how long they hold identifiable searches.  The EU has recommended 6 months, for example.  Yahoo, reducing their retention time from 13 months, has taken a laudable step to reduce that time to 90 days.

In the future, the time it takes a search provider to extract whatever goodness it wants to out of a search query (to feed its varied businesses) and anonymize that query will reach zero.  External pressures aside, the Googles and Yahoos of the world will achieve near-instantaneous goodness-extraction/anonymization of search queries simply because it reduces what they have to store, maintain, and worry about.  That being said, even though search providers will be able to achieve near-instantaneous extraction and anonymization, they will never be able to put it into practice.  Why?  Because there will always be a desire on the part of law enforcement to gain access to those identifiable searches.

The second point relates to the methods and outputs of the anonymization process.  The industry needs to provide greater transparency in their anonymization methods to ensure that the scrubbed queries are truly anonymous.  Consider AOL Stalker.  AOL thought they had scrubbed their searched, but in reality those searches were fairly trivial to de-anonymize.

Removing the last 8 bits of the IP address, as Yahoo and Google are doing, certainly helps to anonymize a search, but it does not do so completely.  In fact, all removing the last 8 bits does is render my IP address indistinguishable from 255 other IP addresses in those last 8 bits – hardly anonymized.  Once a search provider extracts what it needs to from a search, I question why it has to retain any IP information at all.

I applaud Yahoo’s announcement; decreasing retention time is a good thing.  But, I’d like to ask two more of the search providers.  First, work together, in a transparent manner, to ensure the methods anonymization produce truly anonymous search query data.  Make sure that when you strip a search of PII of all sorts including IP address, it cannot be transformed back into an identifiable search.  Second, work with browser makers to have an anonymous search mode.  Akin to the private browsing mode of better browsers everywhere, an anonymous search mode would indicate to the search provider that the search being submitted from the browser must be anonymized immediately.

With announcements like Yahoo’s, 2009 may shape up to be a great year for privacy.

(Cross-posted from Burton Group’s Identity Blog.)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.