Monday, August 27, 2007

AI in DB Record Matching (MPI)

What is Fuzzy Logic?
Fuzzy logic is a term that describes the concept of "maybe" where things aren't always yes, no or 1 or 0.  This third alternative of maybe is the area in which humans operate on a daily basis.   We look to computers all the time for definitive information and we either find what we're looking for or we don't. What fuzzy logic does is allow for the third alternative; that just because something is not a definite Yes doesn't mean it's a No. With most 5 gl languages now you have the ability to define custom variable types. Pascal has had that ability for 30 years making Pascal a language before it's time. With custom defined types you can return results to calling functions in a natural readable way that makes sense in the context of fuzzy logic.
I finished integrating a fuzzy logic parser that I wrote some years ago. All that remained was to put the cut-off thresholds in the main program and call the algorithms and see if they were up to the job. It took a while to get the thresholds fine-tuned for the different types of data but the algorithms are sound. With fuzzy logic parsing you can determine what type of data is contained in a field (like phone number, SSN, DL Number or birth date) and you can compare two values and get a confidence level back. By adjusting the acceptable confidence window you can fine tune your data matching.

I was able to quickly build a database record analyser to match data in different tables and to find duplicate records. The results are uncanny and the more data elements you throw into the mix the better it does. It nailed name misspellings, date transpositions, addresses that were written differently like 7th street -vs- Seventh St. etc. It even hits on similar sounding names very close to the performance of soundex but without the high number of false positives.

It has been really fun to work on some of these projects that I designed years ago but never got the chance to implement. A little vision goes a long way.