Tuesday, September 9, 2014

eRecall: No Free Lunch

By Bill Dimm
There has been some debate recently about the value of the “eRecall” method compared to the “Direct Recall” method for estimating the recall achieved with technology-assisted review. This article shows why eRecall requires sampling and reviewing just as many documents as the direct method if you want to achieve the same level of certainty in the result.
Here is the equation:
eRecall = (TotalRelevant – RelevantDocsMissed) / TotalRelevant
Rearranging a little:
eRecall = 1 – RelevantDocsMissed / TotalRelevant = 1 – FractionMissed * TotalDocumentsCulled / TotalRelevant
It requires estimation (via sampling) of two quantities: the total number of relevant documents, and the number of relevant documents that were culled by the TAR tool.