Preis­trä­ger 2010: Paul Ohm

Die­ter Meu­rer Prize Lec­ture

Paul Ohm

16 Sep­tem­ber 2010

When Dr. Her­ber­ger infor­med me ear­lier this year that I had been awar­ded the Die­ter Meu­rer prize, I reac­ted with a fair amount of sur­prise and healthy sus­pi­cion. As a scho­lar toi­ling away in rela­tive obscu­rity in a small public law school in the foot­hills of the Rocky Moun­ta­ins in the United Sta­tes, I could not fathom that my work had been noti­ced in Ger­many, much less deemed worthy of award.

But as I lear­ned more about the Asso­cia­tion for Com­pu­ting in the Judi­ciary and about juris, my sense of sus­pi­cion faded. This is because the goals of these worthy and venera­ble insti­tu­ti­ons over­lap so much and so well with my rese­arch agenda. Like all of you, I am try­ing to build bridges bet­ween law and com­pu­ter sci­ence; toge­ther we are refu­sing to be satis­fied with ossi­fied and out­mo­ded legal tech­no­lo­gies and approa­ches; toge­ther we are poin­ting the way to a new, bet­ter approach to law and jud­ging, one which embra­ces advan­ces in tech­no­logy.

The more I lear­ned about the Asso­cia­tion and, in par­ti­cu­lar, the work of Pro­fes­sor Her­ber­ger and the late Pro­fes­sor Meu­rer, the more my initial sense of doubt tur­ned into only immense gra­ti­tude. I am also fami­liar with the ama­zing work of many of the past reci­pi­ents of this award, and I am hono­red to be lis­ted with them. So before I begin my sub­stan­tive remarks, I wan­ted to thank Pro­fes­sor Her­ber­ger, the Asso­cia­tion for Com­pu­ting in the Judi­ciary, and juris for this honor. I am very gra­te­ful. I also wan­ted to spe­ci­fi­cally say thank you to Sabine Micka, who hel­ped orga­nize my tra­vel.

When I am not wri­ting arti­cles that are also working com­pu­ter pro­grams, I spe­cia­lize in infor­ma­tion pri­vacy law. Recently, I publis­hed an arti­cle in an Ame­ri­can law jour­nal ent­it­led “Bro­ken Pro­mi­ses of Pri­vacy: Respon­ding to the Sur­pri­sing Failure of Anony­miza­t­ion.”

The cen­tral cont­ri­bu­tion of this Arti­cle is to incor­po­rate a new and excit­ing trend in com­pu­ter sci­ence rese­arch into law and policy for the first time. In this sense, the Arti­cle fits very well with the goals of this con­fe­rence.

The Arti­cle dis­rupts the cen­tral orga­ni­zing prin­ci­ple of infor­ma­tion pri­vacy law, and because not every per­son in the room is an infor­ma­tion pri­vacy expert, I ask the experts to indulge me as I explain. Infor­ma­tion pri­vacy law relies hea­vily on the idea of anony­miza­t­ion, the term used to describe tech­ni­ques used to pro­tect the pri­vacy of people descri­bed in data­ba­ses by dele­tion and fil­tra­tion. For example, we delete names, govern­ment ID num­bers, birth dates, and home addres­ses from data­ba­ses, moti­va­ted by the idea that we can ana­lyze the data left behind without com­pro­mi­sing the iden­ti­ties of the people in the data. Anony­miza­t­ion pro­vi­des the best of both worlds, pro­mi­sing both pri­vacy and uti­lity with one simple and inex­pen­sive data­base ope­ra­tion.

Anony­miza­t­ion has been trusted not only by data­base owners but also by govern­ments; legis­la­tors and regu­la­tors world­wide have crea­ted laws and rules which reward the use of anony­miza­t­ion. In fact, I claim that every sin­gle pri­vacy law ever writ­ten rewards anony­miza­t­ion in some way, large or small, express or implied. In many cases, anony­miza­t­ion pro­vi­des a per­fect safe har­bor: anony­mize your data and the law no lon­ger app­lies at all.

This brings us to the cen­tral tenant of infor­ma­tion pri­vacy law; the one cur­rently under attack.

In the United Sta­tes, we refer to this as the con­cept of „Per­so­nally Iden­ti­fia­ble Infor­ma­tion” or PII.

In the Euro­pean Data Pro­tec­tion Direc­tive, we encoun­ter it through defi­ni­ti­ons of the term „per­so­nal data.“

The idea, no mat­ter what its name, is that we can pro­tect pri­vacy by cate­go­ri­zing our data. We inspect our data­ba­ses to sepa­rate infor­ma­tion that iden­ti­fies a per­son from infor­ma­tion that does not. We approach this task almost like a bio­lo­gist try­ing to clas­sify dif­fe­rent spe­cies of bird, or worm, or mushroom. Just as a mushroom sci­en­tist tries to divide the set of all mushrooms into poi­sonous and non-poisonous, so too does his infor­ma­tion pri­vacy coun­ter­part try to divide the set of data into dan­ge­rous and benign.

This was all a suc­cess­ful state of affairs for many deca­des. Thanks to the power of anony­miza­t­ion, our poli­cy­ma­kers could rely on cate­go­riza­t­ion, PII, and per­so­nal data to strike a balance, one which gua­ran­teed pri­vacy while lea­ving room for busi­nes­ses and rese­ar­chers to use data to bet­ter the world and grow the eco­nomy.

[Pause]

Unfor­t­u­n­a­tely, the cen­tral pre­mise upon which all of this rests — the power of anony­miza­t­ion — has been atta­cked in the past decade.

Com­pu­ter sci­en­tists have repea­tedly demons­tra­ted the sur­pri­sing power of rei­den­ti­fi­ca­tion. By ana­ly­zing the data that anony­miza­t­ion lea­ves behind, these rese­ar­chers have shown that with only a bit of out­s­ide infor­ma­tion and a bit of com­pu­ta­tio­nal power, we can res­tore iden­tity in anony­mi­zed data.

Let me give you only two examp­les:

First, in 1995, a gra­duate stu­dent named Lata­nya Sweeney ana­ly­zed a spe­ci­fic trio of infor­ma­tion — a person’s birth date, U.S. pos­tal code, and sex. She chose these three because many anony­mi­zed data­ba­ses con­tai­ned them, which was under­stan­da­ble since most people thought they could be left behind without com­pro­mi­sing iden­tity. The intui­ti­ons of many — inclu­ding most experts — sug­gested we sha­red our bir­th­date, pos­tal code, and sex in com­mon with many other people; we thought we could hide in a cloud of the many people who sha­red this infor­ma­tion.

Dr. Sweeney pro­ved these intui­ti­ons wrong. By ana­ly­zing U.S. cen­sus data, she deter­mi­ned that 87% of Ame­ri­cans were uni­quely iden­ti­fied by these three pie­ces of infor­ma­tion. What once was reco­gni­zed as anony­mi­zed was sud­denly rejec­ted as iden­ti­fy­ing. Today, many Ame­ri­can laws reflect Dr. Sweeney’s work, by requi­ring the dele­tion of these three cate­go­ries of infor­ma­tion.

But if regu­la­tors thought that Dr. Sweeney had pro­ved that there was some­thing unusual or spe­cial about those three pie­ces of infor­ma­tion, they have lear­ned quite recently how wrong they were. There are many other types of data that share this ability to iden­tify. In fact, some have sug­gested that every piece of use­ful infor­ma­tion about a per­son can be used to iden­tify, if it is con­nec­ted with the right piece of out­s­ide infor­ma­tion.

As only one example, con­sider an Ame­ri­can com­pany, Net­flix, which rents movies on DVD deli­ve­red through the pos­tal mails. On the Net­flix web­site, users rate the movies they have seen, to help Net­flix sug­gest other movies they might enjoy. In an expe­ri­ment in Inter­net col­la­bo­ra­tion — one I should add that has been cele­bra­ted for its many non-privacy-related cont­ri­bu­ti­ons — Net­flix released one hund­red mil­lion records revea­ling how a half mil­lion cust­o­m­ers had rated movies, but only after first anony­mi­zing the data to pro­tect iden­tity.

A mere two weeks after the data release, two rese­ar­chers named Arvind Nara­yanan and Vitaly Shma­ti­kov announ­ced a sur­pri­sing result: The movies we watch act like fin­ger­prints. If you know only a little about a Net­flix customer’s movie watching hab­its, you have a good chance of dis­co­ver­ing his or her iden­tity.

For example, the rese­ar­chers dis­co­ve­red that if you know six somew­hat obscure movies a per­son wat­ched, and not­hing else, you can iden­tify 84% of the people in the data­base. If you know six movies they wat­ched and the appro­xi­mate date on which they rated it, you can iden­tify 99% of the people.

The les­son? When Net­flix cust­o­m­ers are asked at a din­ner party to list their six favo­rite obscure movies, they can­not ans­wer unless they want every per­son at the table to be able to look up every movie they have ever rated with Net­flix.

But more seriously, what is the broa­der les­son?

If we con­ti­nue to embrace the old PII/personal data approach to pro­tec­ting pri­vacy, we will end up with worth­less laws, because these cate­go­ries will con­ti­nue to expand with each new advance in rei­den­ti­fi­ca­tion. For example, the Ame­ri­can health pri­vacy law, HIPPA, lists eigh­teen cate­go­ries of infor­ma­tion a health pro­vi­der can delete to fall out­s­ide the law. Should Ame­ri­can regu­la­tors expand this list to con­tain movie ratings? Of course not; this would miss the point ent­i­rely.

Unfor­t­u­n­a­tely, in 15 minu­tes, the best I can do is share the depres­sing and bleak part of the story. Time doesn’t allow me to share my solu­ti­ons in detail except to say one thing: not­hing we do to replace our old laws will share the power and ease of solu­ti­ons based on anony­miza­t­ion. Pre­ser­ving pri­vacy will become even more dif­fi­cult than it is today. Bey­ond this note, I refer you to the paper to see the ent­ire story.

This con­clu­des my sub­stan­tive remarks. Please let me rei­te­rate: I am most gra­te­ful to have recei­ved this award. I look for­ward to mee­ting many of you and learning from many of you throughout the day. Have a won­der­ful con­fe­rence, and once again, thank you.

Seite Drucken