Imagine reading an article about a new dataset with 1 million annotated facial images… and suddenly finding that one of the pictures belongs to you! And later finding out that it could be being used by the Chinese and US military!
I’m a Flickr user since 2006 (Flicker is a photo hosting and sharing service that was very years ago) and at one time I had thousands of photos hosted there because for some time I use it as an automated photo backup. However, I have basically forgotten about the site until last March when I read an article about a facial recognition dataset released on January by IBM called Diversity in Faces (DiF), ostensibly designed “to advance the study of fairness and accuracy in facial recognition technology” because it is diverse and very large, containing 1 million human facial images which were extracted from another dataset of almost 100 million photos which were shared by their owners on Flickr under a Creative Commons license.
In the article, the authors included a tool to check if a user’s images are included in the dataset, so I checked and, indeed, found that one picture that I took of a friend of mine was included there (and a pretty bad one, I have to say). This got me thinking.
I took the picture in question in 2007 and it was automatically uploaded to Flickr along with many others with a Creative Commons (CC BY 2.0) license that I set up by default, which allows anyone to use my photos if they include the attribution, and indeed some people have used them in the past, for example, to illustrate blog posts.
The issues start here. My idea, and the idea of many other photographers, at selecting a CC license is to allow individuals, artists, and nonprofits to use my images for free in their creative endeavors; I definitively didn’t have IBM, on the biggest corporations in the world, in my mind when I took that decision. This can be viewed as a problem of informed consent and data ownership because my data is being repurposed by a third party without my explicit permission, and allowing my data to be repurposed even more by the entities that will download the dataset and use it to build machine learning models for purposes that are entirely outside my knowledge and control.
IBM is addressing the problem of algorithmic fairness by curating and releasing this dataset specifically “to advance the study of fairness and accuracy in facial recognition technology”. The problems that facial recognition technologies have recognizing non-white people have been extensively documented, and the main reason for this is the lack of diversity in existing training datasets, so efforts like this can greatly impact this problem. However, accuracy is only one side of the fairness problem, the other is discrimination, and, in the context of facial recognition, especially racial profiling. Making facial recognition systems more accurate at recognizing minorities can exacerbate the problems those minorities suffer because of intentional or unintentional discrimination by the owners of those systems.
Instances of discrimination powered by facial recognition have been extensively documented, for example, The New York Times and Wired recently published feature articles about “China’s Massive Surveillance Operation” against their Uyghurs minority, and this particular example could be even closer to this case study than it may appear:
A few days after the article about the faces dataset was published, IBM implemented changes to address some of the privacy issues mentioned there, like making easier for photographers to opt-out of the dataset, and afterward, they send an email to all entities that have downloaded the dataset up to that time informing them of the changes, but they made a blunder and included all email addresses in the CC field of the email allowing researchers to view who else had downloaded the dataset. According to Os Keyes, a Ph.D. student in the Data Ecologies Laboratory at the University of Washington’s, recipients include emails associated with 10 companies who sell to the Chinese military, along with 14 companies who sell to the US military, 1 who exclusively works for the US military, and the Portuguese military. This means that there is a small but real possibility that a picture that I took years ago can end up helping train a facial recognition system that may be used as a weapon by some of the biggest militaries in the world.
Talking about unintended consequences!