Cosine Similarity can be used to find matching items in a list, based on a user input. This class is especially useful for finding matching items in lists that are dynamically populated, which might rule out the use of custom entities. We provide a Groovy class containing several algorithms to find matching items in a list, based on a user input. This class is especially useful for finding matching items in lists that are dynamically populated, which might rule out the use of custom entities.
Add the file FuzzySearch.groovy to the Resources in your solution and set the path to /script_lib
.
For more details on managing files in your solution, see Resource File Manager.
We provide three fuzzy search algorithms: Cosine Similarity (based on n-grams), Edit Distance and Word Count.
You can read more around this topic in the following knowledge article: Fuzzy Search in Teneo.
You can call the FuzzySearch class in any script in Teneo Studio, for example in script nodes, in listeners or also as a script condition in transitions. The code can be used like this:
FuzzySearch. mostSimilarByCosineSimilarity(String pattern, List candidates, double threshold, int degree)
The mostSimilarByCosineSimilarity methods have the following arguments:
Argument | Description |
---|---|
pattern | The input string |
candidates | The possible matches |
threshold | The matching threshold, a value between 0 and 1 |
degree | N-gram degree, an integer with default value 2 |
FuzzySearch. mostSimilarByEditDistance(String pattern, List candidates, int threshold, Boolean allowSubstitution)
The mostSimilarByEditDistance methods have the following arguments:
Argument | Description |
---|---|
pattern | The input string |
candidates | The possible matches |
threshold | The edit distance threshold, an integer with default value 10 |
allowSubstitution | A Boolean value, if true use Levenshtein distance; if false use LCS distance |
FuzzySearch. mostSimilarByWordCount(String pattern, List candidates, int threshold)
The mostSimilarByWordCount methods have the following arguments:
Argument | Description |
---|---|
pattern | The input string |
candidates | The possible matches |
threshold | The matching threshold, an integer with default value 1 |
An ordered list of matching candidates, the contents are different according to the fuzzy search algorithm you choose:
Suppose we want to allow someone to use natural language to choose a restaurant from a list of nearby restaurants. Let's say the list of nearby restaurants is retrieved using an API and stored in a variable 'restaurantNames'. To check if an input contains a restaurant name that is in the list using cosine similarity, we can use the following code:
def matchingItems = FuzzySearch.mostSimilarByCosineSimilarity (_.userInputText, restaurantNames, 0.40)
If the value of 'restaurantNames' was ["Happy Thai", "Delicious Seafood", "Pete's Deli"] and the user input text was Deli, the value of 'matchingItems' would be:
["Pete's Deli", "Delicious Seafood"]
The CosineSimilarity class was written by Burt Beckwith. The source can be found in Grails core. For more details on the Cosine Similarity algorithm, see Fuzzy Matching with Cosine Similarity
This extention is also availabe in a demo solution that can be downloaded here: Fuzzy Search Demo.solution Download the FuzzySearch.groovy file here: FuzzySearch.groovy
Was this page helpful?