Fuzzy Text Matching

Cosine Similarity can be used to find matching items in a list, based on a user input. This class is especially useful for finding matching items in lists that are dynamically populated, which might rule out the use of custom entities. We provide a Groovy class containing several algorithms to find matching items in a list, based on a user input. This class is especially useful for finding matching items in lists that are dynamically populated, which might rule out the use of custom entities.

Installation

Add the file FuzzySearch.groovy to the Resources in your solution and set the path to /script_lib.

For more details on managing files in your solution, see Resource File Manager.

Algorithms

We provide three fuzzy search algorithms: Cosine Similarity (based on n-grams), Edit Distance and Word Count.

You can read more around this topic in the following knowledge article: Fuzzy Search in Teneo.

Usage

You can call the FuzzySearch class in any script in Teneo Studio, for example in script nodes, in listeners or also as a script condition in transitions. The code can be used like this:

Use Cosine Similarity

FuzzySearch. mostSimilarByCosineSimilarity(String pattern, List candidates, double threshold, int degree)

The mostSimilarByCosineSimilarity methods have the following arguments:

Argument Description
pattern The input string
candidates The possible matches
threshold The matching threshold, a value between 0 and 1
degree N-gram degree, an integer with default value 2

Use Edit Distance:

FuzzySearch. mostSimilarByEditDistance(String pattern, List candidates, int threshold, Boolean allowSubstitution)

The mostSimilarByEditDistance methods have the following arguments:

Argument Description
pattern The input string
candidates The possible matches
threshold The edit distance threshold, an integer with default value 10
allowSubstitution A Boolean value, if true use Levenshtein distance; if false use LCS distance

Use Word count:

FuzzySearch. mostSimilarByWordCount(String pattern, List candidates, int threshold)

The mostSimilarByWordCount methods have the following arguments:

Argument Description
pattern The input string
candidates The possible matches
threshold The matching threshold, an integer with default value 1

Results

An ordered list of matching candidates, the contents are different according to the fuzzy search algorithm you choose:

  • Cosine Similarity: all candidates whose similarity score with the pattern is greater than the threshold, ordered by closest match first.
  • Edit Distance: all candidates whose edit distance is lower than the threshold, ordered by closest match first.
  • Word Count: all candidates that have most words in common with the pattern.

Example

Suppose we want to allow someone to use natural language to choose a restaurant from a list of nearby restaurants. Let's say the list of nearby restaurants is retrieved using an API and stored in a variable 'restaurantNames'. To check if an input contains a restaurant name that is in the list using cosine similarity, we can use the following code:

def matchingItems = FuzzySearch.mostSimilarByCosineSimilarity (_.userInputText, restaurantNames, 0.40)

If the value of 'restaurantNames' was ["Happy Thai", "Delicious Seafood", "Pete's Deli"] and the user input text was Deli, the value of 'matchingItems' would be:

["Pete's Deli", "Delicious Seafood"]

Credits

The CosineSimilarity class was written by Burt Beckwith. The source can be found in Grails core. For more details on the Cosine Similarity algorithm, see Fuzzy Matching with Cosine Similarity

Download

This extention is also availabe in a demo solution that can be downloaded here: Fuzzy Search Demo.solution Download the FuzzySearch.groovy file here: FuzzySearch.groovy

Was this page helpful?