A study of “A Study of Face Obfuscation in ImageNet”

13 min readMar 15, 2021

Table of Contents:

— — — — — — — — — — — — — — — — — — — — — — — -

· Background
· Main Issues
∘ Issue-1: The curious case of Face obfuscation
∘ Issue-2: NSFW analysis
∘ Issue-3: Human co-occurrence analysis
∘ 👹FAQs of the ‘Devil’s advocacy’ kind: Our humble tribute to the cult of “Both-sideism”
∘ 👹1: Why not just contact them in lieu of public grandstanding?
∘ 👹2: May be your little emails slipped through the cracks perhaps?
∘ 👹 3: Well, OK. But PL is not the sole author. How do you know all the co-authors and the collaborators in the acknowledgment were even aware of your work?!
∘ 👹 4: In the Wired interview published on March 15th, when pressed by the reporter, one of the authors states that “a citation will appear in an updated version of the paper”. Doesn’t that solve the problem?
· Concluding thoughts: The real issues
∘ a) Erasure of black-women-scholarship:
∘ b) Revisiting the horrors of ghost labor:
— — — — — — — — — — — — — — — — — — — — — — — -

Background

On June 24, 2020, Abeba Birhane and I released on our paper “Large image datasets: A pyrrhic win for computer vision?” critiquing the culture of large scale datasets in Computer Vision. In the paper, we performed a cross-sectional model-based quantitative census covering factors such as age, gender, NSFW content scoring, class-wise accuracy, human-cardinality-analysis, and the semanticity of the image class information in order to statistically investigate the extent and subtleties of ethical transgressions using the ImageNet dataset as a template. The nature and the expanse of the transgressions attracted quite some media attention (See this, this and this). In Section 2.3 of our paper, we revisited the downstream effects of “The WordNet Effect” (that results from inheriting labels from the WordNet taxonomy ) and showed how this affects not just the ImageNet dataset but also other datasets such as the the Tiny Images dataset and the latest Tencent-ML-Images dataset that either directly or indirectly inherited the label-space from WordNet. On June 29th 2020, we learnt that the curators of the Tiny Images dataset had apologized and withdrawn the dataset.
In Jan 2021, the paper was formally presented at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV -2021) and has been cited in more than two dozen papers since.
In the backdrop of all of this work, this Wednesday, on the 10th of March-2021, we encountered a paper titled A Study of Face Obfuscation in ImageNet from the ImageNet curators that has left us disappointed and flummoxed. By indulging in what appears to be a calculated and systematic erasure of the entire body of critique that our work was a part of, the authors have sent out a wide range of wrong signals. This erasure is doubly disappointing given how the community had recently rallied behind the main visionary of the ImageNet project when her contributions towards the “AI revolution” were being erased in an online compendium and the sheer clout she enjoins in the field.

Left: Version of Brief History of Deep Learning from 1943–2019 [Timeline] on Apr 23, 2020. Right: Version today after the community uproar driven by Dr. Gebru’s tweet

Below, we bemoan this unfortunate departure from norms of academic integrity by carefully disentangling the specific details that characterize the situation from our standpoint. In doing so, we are sharing the exact snapshots of the conversation(s) that unraveled between the parties involved here.
Pre-script: The authors of the paper Large image datasets: A pyrrhic win for computer vision?, (and this blog-post you are reading), are abbreviated as VP (Vinay Prabhu) and AB (Abeba Birhane) respectively in the rest of the material presented here. PL refers to the main visionary of the ImageNet dataset. Wherever relevant, ‘Our paper’ refers to Large image datasets: A pyrrhic win for computer vision? and ‘their paper’ refers to A Study of Face Obfuscation in ImageNet.

Main Issues

Issue-1: The curious case of Face obfuscation

Background: Section 4: Candidate solutions: The path ahead in our paper, was perhaps, the most difficult for us to author. We knew we were submitting to WACV (on the stubborn insistence of VP) , an unlikely venue for ‘fairness papers’ whose submission portal did not even have a primary or secondary topic for “Explainable AI, fairness, accountability, privacy, and ethics in vision”, which did receive a mention in the CFP however. (See screenshot below).

Screenshot of the email VP wrote to WACV organizers. There was no reply received to this email.

Our goal, simply put, was to in fact, engage directly with the practitioners in the field and not just the ethics community. And, in many ways, it did culminate in a rather “lively discussion” when we did present the paper at WACV in a session chaired by Jordi Pont-Tuset, a Research Scientist @ Google Zürich.
At this juncture, we’d like to share that AB, along with a lot of our other colleagues and pre-reviewers, rightfully questioned the very need for the section as they felt it reeked of tech-solutionism. Nonetheless, predicting the clamor for ‘possible solutions’ from the reviewers of this traditional Computer Vision conference (which was eventually proven to be a correct assumption), the section persisted. In this regard, we’d like to draw the attention of the reader towards Section 4.3 in our paper which is literally titled “Differentially private obfuscation of the faces” where we state: ”This path entails harnessing techniques such as DP-Blur [36] with quantifiable privacy guarantees to obfuscate the identity of the humans in the image. The Inclusive images challenge [94], for example, already incorporated blurring during dataset curation and addressed the downstream effects surrounding change in predictive power of the models trained on the blurred versions of the dataset curated. We believe that replication of this template that also clearly included avenues for recourse in case of an erroneously non-blurred image being sighted by a researcher will be a step in the right direction for the community at large”. As evinced by the papers we cited, privacy preserving obfuscation of images is neither a novel idea and most certainly not our idea. But, in the specific context of imagining a face-obfuscated version of ImageNet, it is reasonable to assume that any one who will author a paper audaciously titled “A Study of Face Obfuscation in ImageNet” will pay at least a lip-service towards citing either our work and/or [94] in our paper which is:
[94] Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. No classification without representation: Assessing geodiversity issues in open data sets for the developing world. arXiv preprint arXiv:1711.08536, 2017.
But the authors chose not to cite this one either. Their paper begins with “Image obfuscation (blurring, mosaicing, etc.) is widely used for privacy protection. However, computer vision research often overlooks privacy by assuming access to original unobfuscated images” (Like, really?!🙄 ) and goes on to claim that they have discovered that “.. the dataset exposes many people co-occurring with other objects in images, e.g., people sitting on chairs, walking their dogs, or drinking beer (Fig. 1). It is concerning since ILSVRC is publicly available and widely used.”🥴

Also, we’d like to ask the readers to take 2 minutes to parse these FAQs from the associated Kaggle contest ([94] in our paper) from > 2 years ago and then read their paper again 🤐

Source: https://www.kaggle.com/c/inclusive-images-challenge/overview/inclusive-images-faq#recognizable-faces

Issue-2: NSFW analysis

The term NSFW appears 38 times in our paper and we not only curated a class-wise meta-dataset (df_nsfw.csv | Size: (1000, 5) ) consisting of the mean and std of the NSFW scores of the train and validation images arranged per-class but also dedicated Appendix B.2 towards “NSFW scoring aided misogynistic imagery hand-labeling”. In Table-5, we specifically focus on classes 445, 638,639, 655 and 459 mapping to bikini, two-piece , maillot ,
miniskirt and brassiere/ bra/ bandeau in the dataset that we found were NSFW-dense classes.

Again, much to our disappointment, the authors claim to have discovered that : ”The number of NSFW areas varies significantly across different ILSVRC categories. Bikini is likely to contain much more NSFW areas than the average.” 😒

Issue-3: Human co-occurrence analysis

In our paper, we dedicated Section B.3 Dogs to musical instruments: Co-occurrence based gender biases towards human co-occurrence-biases, specifically with regards to classes involving dog-breed-class images and musical instruments that have high density of incidentally co-occurring humans. Their new paper states: “ Results suggests that super categories such as clothing and musical instrument frequently co-occur with people”🤦

👹FAQs of the ‘Devil’s advocacy’ kind: Our humble tribute to the cult of “Both-sideism”

Given the attention this might elicit we pre-emptively anticipate the exact flavor of attacks and cover the following “Devil’s advocacy counter-points” in the section below:

👹1: Why not just contact them in lieu of public grandstanding? Have you bothered to even contact the curators of the ImageNet dataset?

Yes! Glad you asked. Here are the screenshots of our emails dating all the way back to Aug 19th 2019 and later, on Apr 12, 2020 to which we received no replies whatsoever:

Understanding the magnitude of the impact and being wary of any possible Streissand effect, we spent the entirety of the near 10 month period between Aug 2019 and Jun 2020 in various outreach efforts amongst many journalists, Computer Vision and Ethics communities and organizations. This also involved VP working with journalists such as Katyanna Quach at The Register who then authored this article: Inside the 1TB ImageNet data set used to train the world’s AI: Naked kids, drunken frat parties, porno stars, and more

***👹2: Oh come on! Stop with the self-aggrandizing and self-loathing. AI royalty tend to receive hundreds of emails a day. May be your little emails slipped through the cracks perhaps?***

Again, glad you asked!
Lemma-1: PL was *extremely* well aware of the work.
Proof: The paper that we published heavily draws from my talk “On the four horsemen of ethical malice in peer reviewed machine learning literature” given under the aegis of the Stanford-HAI weekly seminars (thanks to an invite from Colin Kelley Garvey, an AI ethicist) on April 17, 2020–11:00am–12:00pm. On Apr 15th, I received this email from a HAI co-ordinator stating that “I just spoke with HAI Co-Director, Fei-Fei Li, and she would like to come on screen after you finish your talk and ask you a few before Colin gives you questions from the audience. Please let me know if you are comfortable with this request”.

This was followed by the first ever communication I received voluntarily from PL whose screen-shot is below.

This was followed by a delayed reply on April 17th, that read …

And lastly, here is the actual video of our zoom-face-to-face meeting 📹 https://youtu.be/hpA67iDxNGU
Q.E.D !
👹 3: Well, OK. But PL is not the sole author. How do you know all the co-authors and the collaborators in the acknowledgment were even aware of your work?!

Because they have literally cited us just recently‽ In their paper titled “REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets”, the authors contextualize our work by citing “Recent work [51] has looked at dataset issues related to consent and justice, and motivate enforcing Institutional Review Boards (IRB) approval for large scale datasets.” A reductionist take on our work, but a proof-of-awareness nonetheless!

👹 4: In the Wired interview published on March 15th, when pressed by the reporter, one of the authors states that “a citation will appear in an updated version of the paper”. Doesn’t that solve the problem?

Again. This blog is not about citation-seeking. We’d like to clearly point out that the biggest shortcomings are the tactical abdication of responsibility for all the mess in ImageNet combined with systematic erasure of related critical work, that might well have led to these corrective measures being taken.
The authors tactically left out an entire body of literature that critiqued the ImageNet beginning with the ImageNet audits by Chris Dulhanty and Alexander Wong (and not to mention Chris’ entire thesis) and more recent data-archeological expeditions such as Lines of Sight by Alex Hanna et al. This shouldn’t come as a surprise to anybody because their last inquisition into the Person subtree ( where they admitted that of the 2832 people categories that are annotated within the subtree, 1593 of them were potentially offensive labels and only 158 of them were visual), they made no mention of the hugely influential ImageNet Roulette project ( that went viral on September 19, 2019 while the paper only hit the ArXiv servers on 16 Dec 2019!). Also, lest we forget that these solutions are being ushered in a good 12 years after the dataset release. T-W-E-L-V-E YEARS!

Concluding thoughts: The real issues

a) Erasure of black-women-scholarship:

AB’s central role in turning a rag-tag set of empirical results and revelations into a cogent peer-review-worthy publication and later investing all the efforts to champion it’s cause via talks, interviews and presentations is one of the main reasons why the paper is even being cited now. The primacy of her contributions is also reflected in the official citation that literally reads:
@inproceedings{birhane2021large,
title={Large Image Datasets: A Pyrrhic Win for Computer Vision?},
author={Birhane, Abeba and Prabhu, Vinay Uday},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1537–1547},
year={2021}
}
But, unfortunately for her, undervaluing of her scholarship is not an aberration but a trend. Black women’s intellectual production has historically been ignored and systemically erased. The hierarchical academic structure that devalues Black women’s intellectual contributions makes contesting such injustice a tiresome endeavor discouraging Black women scholars from coming forward. Black feminist theory scholars such as Jennifer Nash, have extensively explored the Citational Desires of scholars whose contributions have been systematically under-emphasized. Initiatives such as the Cite Black Women collective (https://www.citeblackwomencollective.org/) work towards dismantling precisely this behavior in academia and it is unfortunate to see this behavior reinforced by highly esteemed scholars who are supposed to be the torchbearers of hope.

b) Revisiting the horrors of ghost labor:

During our draft revisions, specifically Section-4, AB and I were in the midst of a ‘How do we go about fixing this morass?’ conversation, when we realized two things: In order to truly clean up the dataset, we’d be forced to make two massive compromises:

Resort to using the unethical “SoTA” tools from companies like Amazon, Face++ or Clarifai to perform face detection and filter the problematic images
Resort to exploiting the ghost labor markets of AMT to hand-annotate the NSFW facet of the dataset.

As it turns out, on the very same day that the Turkopticon fundraising campaign was announced, a few hours later, we see the efforts of this paper falling prey to both the ills. In fact, the gamified HIT (Human
Intelligence Task) details reads 🤢: These images have verified ground truth faces, but we intentionally show incorrect annotations for the workers to fix. The entire HIT resembles an action game. Starting with 2 lives, the worker
will lose a life when making a mistake on gold standard images. In that case, they will see the ground truth faces (Fig. B Right) and the remaining lives. If they lose both 2 lives, the game is over, and they have to start from scratch at
the first image. We found this strategy to effectively retain workers’ attention and improve annotation quality.

(Also see https://www.vice.com/en/article/88apnv/underpaid-workers-are-being-forced-to-train-biased-ai-on-mechanical-turk )

To conclude, we say:
- This is NOT us desperately hoping to drum up some antics to garner more attention
- This is NOT us trying to eke out one more citation
- This is NOT us assuming the proverbial higher pedestal and judging anyone
- This is NOT an ad hominem attack on any member of the ImageNet team.
- This IS us calling out a pattern of citation erasure (with specific verifiable proofs) and highlighting the ethical shortcomings in a paper that will probably be extremely well cited in the near future and much worse, celebrated (wrongly IMHO) as a template for stop-gap fixes.
We call upon the curators of the dataset to pay heed to the issues raised and take corrective measures.

Kindest regards,

Abeba Birhane and Vinay Prabhu

PS: If all of this is confusing, here is the VERIFIABLE timeline of events to summarize what happened.
1: 19 Aug 2019 — Contacted ImageNet curators via email. No response.
2: Sep 2019: Chat with Katyanna Quach at ‘The Register’ in order to research the specific details regarding ImageNet for an impending article.
3: 23 Oct 2019: Register article comes out: https://www.theregister.com/2019/10/23/ai_dataset_imagenet_consent/
4: 12 Apr 2020: Second email contact with the ImageNet curators via email. No response.
5: 15 Apr 2020: PL contacts me via email
6: Apr 25, 2020 : Talk at Stanford that PL attends titled “Ethical Malice in Peer-Reviewed Machine Learning Literature” (Video link included)
7: June 2020, The first version of our paper appears on ArXiv : https://arxiv.org/abs/2006.16923
8: March 2021, PL et al publish “A Study of Face Obfuscation in ImageNet” sans any citation or acknowledgement