Monday, September 17, 2007

New Sedona Conference comments lend credence to new search tools

Thank goodness there are lawyers out there willing and able to cull down commentary from the Sedona Conference into bite-sized morsels we can all consume. In this case, Ralph Losey has done a great job on his blog in summarizing the most recent commentary from the Sedona Conference, published in August of 2007.

What's interesting is direct verbiage from the Sedona Search Team almost admonishing the reliance on simple keyword search technology for the review of ESI:

. . . the experience of many litigators is that simple keyword searching alone is inadequate in at least some discovery contexts. This is because simple keyword searches end up being both over- and under-inclusive in light of the inherent malleability and ambiguity of spoken and written English (as well as all other languages). . . .

The problem of the relative percentage of “false positive” hits or noise in the data is potentially huge, amounting in some cases to huge numbers of files which must be searched to find responsive documents. On the other hand, keyword searches have the potential to miss documents that contain a word that has the same meaning as the term used in the query, but is not specified. . . .

Finally, using keywords alone results in a return set of potentially responsive documents that are not weighted and ranked based upon their potential importance or relevance. In other words, each document is considered to have an equal probability of being responsive upon further manual review.

But the Sedona Search Commentary does not end on a negative note; instead it discusses new search technologies that will significantly improve upon the dismal recall and precision ratios of keyword searches:

Alternative search tools are available to supplement simple keyword searching and Boolean search techniques. These include using fuzzy logic to capture variations on words; using conceptual searching, which makes use of taxonomies and ontologies assembled by linguists; and using other machine learning and text mining tools that employ mathematical probabilities..

The last tidbit Ralph brings to our attention is a call to action from the Team:

The legal community should support collaborative research with the scientific and academic sectors aimed at establishing the efficacy of a range of automated search and information retrieval methods.

Looking at these comments, albeit in a vacuum, it's astonishing to see such a clear line in the sand drawn by the Team. Clearly, reliance on simple keyword such isn't going to cut it for much longer. Vendors like Recommind, Engenium, Sygence, Content Analyst and the like will be drooling once word of this gets to them.

There's a lot more on Ralph's blog about this and he writes much better than I do, so I encourage you to read the post in it's entirety.

No comments: