For years, the IRB community has sought metrics by which to define a “quality” IRB. This search at times has seemed as futile as the search for the Holy Grail. However, with the new National Institutes of Health (NIH) policy requiring a single IRB of record for multisite studies and a pending Common Rule requiring the same, the search has renewed vigor.
Researchers and institutions who traditionally have been constrained to using just one or two IRBs now must choose among many. With this choice comes questions: which IRBs are performing well? How do we quantify an IRB’s performance? And these questions lead to the even more critical question: what do we mean by “quality” in the context of IRB review? The IRB community has been struggling to answer this question for decades, and perhaps it is time to acknowledge that the answer must include the balancing of a variety of performance metrics.
The Long Quest for a Quality Metric
Arguably, the search to define a quality IRB review began 20 years ago, when the Department of Health and Human Services (DHHS) Office of the Inspector General issued a report raising a warning and calling for reform among the thousands of US IRBs. The report identified problems with lack of capacity, lack of resources, lack of independence, and conflicts of interest. And the report concluded by pointing out that “IRBs rarely conduct inquiries to determine how well they are accomplishing their mission.”
In the two decades since the report, the IRB community has continued to struggle. Despite many efforts, a single definition of quality IRB review remains elusive. An NIH workshop on Alternative Methods of Review over ten years ago concluded that a key issue to resolve is “a clear understanding of what ‘quality’ really means in a review and how it can be measured.” A few years later, a review of 43 empirical studies of IRB processes and outcomes concluded that “[d]espite recognition of a need to evaluate effectiveness of IRB review, no identified published study included an evaluation of IRB effectiveness,” and “[a]dditional research is needed to understand . . . what quality IRB review is . . .” And just this past May, a group of experts in ethics and IRB administration gathered in Washington, D.C. to attempt again to articulate meaningful metrics for assessing the quality of IRB review.
A Comparison to Judicial Systems
Perhaps the IRB community could benefit by turning to scholarship developed in a similar field: judicial decisionmaking. Scholars have struggled to articulate metrics for assessing quality court systems. A number of writers have pointed out that a single measurement is difficult when courts serve many interests: the interest of each party to a lawsuit, for example, as well as the overarching interests of a society that benefits from a reliable rule of law. As one report concludes:
The fact that courts do not have a single bottom line, a monochromatic audience (or one type of customer), or one product line to the same extent as some private sector organizations or many other public bodies, makes their performance difficult to measure and manage.
In light of these difficulties, a number of writers on judicial decision making have called for reliance on a variety of factors to assess quality. A European council of judges, for example, asserted in an opinion that:
[T]he quality of judicial decisions is influenced by the quality of all the preparatory steps that precede them and therefore the legal system as a whole has to be examined. Moreover, seen from the perspective of the court users, it is not only the legal quality stricto sensu of the actual decision that matters; attention has also to be paid to other aspects such as the length, transparency, and conduct of the proceedings, the way in which the judge communicates with the parties and the way in which the judiciary accounts for its functioning to society.
More recently, the National Center for State Courts promoted a “Balanced Scorecard” approach, by which the performance of a court system would be measured by a variety of metrics from a variety of perspectives.
Similarly, IRBs struggle to articulate their effectiveness in part because they serve a variety of customers: the researcher who submits a study, the potential participants in that study, the members of society who might benefit from the outcome of the study, and the overarching interests of a society that benefits from the enforcement of ethical norms in research. With these different audiences, a metric that measures success for one audience (such as a researcher’s interest in study turnaround time) can be completely irrelevant to another audience (such as the potential participants’ interest in their rights and safety). Just as a court system is best measured as a whole with a variety of metrics, an IRB probably is best measured with metrics that capture the variety of functions critical to the IRB’s mission.
Choosing the Right IRB
Even if we conclude that a variety of metrics should be used to measure IRB quality, the question again arises: Which metrics should be considered? The IRB community already has a number of qualitative and quantitative issues to consider. Accrediting organizations, such as the Association for the Accreditation of Human Research Protection Programs, Inc. (AAHRPP), already have assimilated topics and measurements designed to measure the effectiveness of an IRB system. For those IRBs that are not accredited, the Clinical Trials Transformation Initiative, a collaboration between institutions, federal agencies, and industry sponsors, has developed a checklist of issues for a local IRB to consider before deferring jurisdiction to a central IRB.
As a commercial, independent IRB, Quorum Review has spent years demonstrating its quality to researchers, auditors, study staff, accreditation site visitors, and local IRBs. We are proud of the expertise and training of our Board members, the consistency of our decisions, and our operational excellence. Whether your organization is seeking a quality IRB for a single study or a clinical research program, Quorum would be happy to share its quality metrics with you.
On a recent trip, I had the opportunity to contemplate the Achatschale, a large and ancient bowl cut with remarkable skill from a single piece of agate. For a number of centuries, this bowl was promoted as the Holy Grail. Whether or not it is or ever was the Holy Grail, the bowl is a thing of beauty and worthy of appreciation. The IRB community may never find a single metric to measure IRB quality, but in the meantime perhaps we can build a useful set of metrics to measure the many important facets of a functioning IRB organization.