Validating fuzzy logic values
They have implementations in lots of other languages on the wikipedia page for it, too.This question has been answered, but should you find any of the identified problems with the SOUNDEX appearing in your application, it's nice to know there are options.
I have an existing and growing MYSQL database of companies names, each with a unique company_id.During this process, commonly occurring words like 'the','and', etc should be discarded.We then create several indices on the word table, as follows...there's a UDF for it available here: the downside to using levenshtein is that it won't scale very well.a better idea might be to dump the whole table in to a spell checker custom dictionary file and do the suggestion from your application tier instead of the database tier.** For example, someone writes: I have found the following threads that seem similar to this question, but the poster has not approved and I'm not sure if their use-case is applicable: How to find best fuzzy match for a string in a large string database Matching inexact company names in Java For more advanced needs, I think you need to look at the Levenshtein distance (also called "edit distance") of two strings and work with a threshold.
This is the more complex (=slower) solution, but it allows for greater flexibility.
Once this is in place, we take any user input and search using normal word = input or LIKE input%.
We never do a LIKE %input as we are always looking for a match on any of the first 3 characters, which are all indexed.
Sometimes it can generate the same code for two really different words.
Double metaphone was created to help take care of that problem.
Another algorithm was created called the Metaphone, and it was later revised to a Double Metaphone algorithm.