In the WIRED article “Scientists Are Just As Confused About The Ethics Of Big-Data Research As You,” Sarah Zhang identifies a disconnect between what is formally reviewed by ethical committees (IRBs) and what is likely to elicit public outrage. The article reminds us of the unfinished business of developing reasonable review criteria for social science research and touches on the questions raised by big data’s ability to violate traditional expectations of privacy. But the article is misleading in at least one respect and doesn’t do justice to the underlying ethical questions.
IRBs Have No Jurisdiction over Big Data
Federal regulations charge IRBs with ensuring that research on people is conducted ethically. But most big data studies—including the OKCupid and Facebook studies discussed in the WIRED article—fall outside of an IRB’s jurisdiction.
The article correctly notes that IRB review is required (and routine) for federally funded research. But big data research is usually funded by corporations, and the article never explicitly states the consequence of that structure: when an IRB has no authority, its review standards and technical expertise matter little.
It’s Not Medical Research vs. Social Science Research
Secondly, the article states that the research ethics regulations are not appropriate for social science because they originated in response to medical research. This observation is not quite accurate.
Discussion of the ethical issues behind the regulations predates reactions to the deceptive research of the Tuskegee syphilis study, which forced action but did not by itself frame the debate. That framing began at least as early as the Nuremberg Trials.
Unlike the doctors involved in the Tuskegee experiment, Nazi doctors never offered the pretense of a doctor-patient relationship. Their experiments were biological, not medical. It was not the violation of medical norms but of ethical norms, particularly those around autonomy, that made these “experiments” so horrific.
It’s the Type of Harm
Placing involuntary subjects at risk of death or long-term disability was so blatant a violation of societal norms that it demanded action. Our current review system is in part a response to those egregious harms.
For protection of research participants, the biggest difference between social science research and medical research is not that the latter is “medical” or conducted by medical doctors. The fundamental difference lies in the degree and nature of harm to which participants are exposed. It is at the root of social scientists’ appropriate dissatisfaction that our regulations fail to sufficiently recognize that difference.
Which brings us back to big data research. Even if all such research was subject to ethical review, and even if all IRBs had members with expertise in data security, we still would need to identify the harms to which research participants (or perhaps subjects in this setting, since participation is arguably involuntary) are exposed and decide what degree of oversight these harms entail.
The Harms of Big Data Research
I see two kinds of direct harms. The first is loss or violation of privacy. As the WIRED article discusses, big data can reconstruct a disturbingly complete picture of an individual by cross-linking databases that in isolation are effectively anonymous. Such data could include not only spending habits but also medical history, sexual orientation, debt, and other characteristics we feel are ours to disclose.
In the context of research on medical records that falls within their jurisdiction, IRBs do discuss the risks of privacy violations, but they focus primarily on the consequent downstream harms of the disclosures rather than the privacy violations themselves.
Particulars of technology are a distraction. While a minimum level of security must be met, experience has taught us that we can ask to reduce (or, in the words of the regulations, “minimize”) these risks, but we cannot make them go away. Cross-linking and identification that is not practical today is almost certainly likely to be possible tomorrow, if worth someone’s effort. Technical protections may delay the day when we have to consider the more basic ethical question, but eventually we will have to discuss it.
Or perhaps not. Perhaps the point is moot. While we debate the importance of privacy in the research setting, the current internet economy is built on the business value of cross-linking data, and it occurs every time we use a credit card or do a Google search.
In a subsequent post I will discuss the second potential harm I see in big data research, one that is more difficult to name and, perhaps, to rationally justify.