Tuesday, March 6, 2012

"Fuzzy search" component


Good day.

I'd like to add "fuzzy search" functionality to my application.

"Fuzzy search" in this topic means selecting (from DB table) rows, which have "fuzzy search" coefficient (calculated using etalon string) not less some_predefined_const. Fuzzy search coefficient calculating algorithm can be various.

So with etalon string "Margaret" "fuzzy search" can find "Nargaret", "Margoret", "Margret" etc.

IMHO time to develop, test and tune code must be quite long. I prefer to buy such "fuzzy search" component.

Does anybody know where can I get such component - server version (SQL 2005) or client version (.NET)?
How much can such component cost?

Thanks.

Hello,

Have you looked at the SOUNDEX() function? Although not offering the complete solution, it'll give you something to build upon. For example, if CustName contained Margaret, Nargaret and Margoret, you could:

SELECT * FROM dbo.Customers WHERE SOUNDEX(CustName) LIKE '_626'

This will return all three CustName values.

Cheers,

Rob

|||

No, it won′t. The Naragret is a "n" variation, therefore you probably would need a stronger SOUNDEX functionality like the one that comes with SQL Server. See the following sample to test the soundex:

CREATE TABLE #SomeTable

(

nameCol VARCHAR(50)

)

GO

INSERT INTO #SomeTable

VALUES ('Margaret')

INSERT INTO #SomeTable

VALUES ('Nargaret')

INSERT INTO #SomeTable

VALUES ('Margoret')

SELECT * FROM #SomeTable

WHERE SOUNDEX(namecol) = SOUNDEX('Margaret')

nameCol

--

Margaret

Margoret

(2 row(s) affected)

I would sugegst buying a thrid party component which can be either used from your application code or within the SQLCLR (if you don′t mind using SQLCLR in your SQL Server)

Jens K. Suessmeyer

http://www.sqlserver2005.de

|||

Sure it will:

Code Snippet

create table test(CustName varchar(40) not null)

go

insert into test

select 'Margaret'

union

select 'Nargaret'

union

select 'Margoret'

go

SELECT * FROM dbo.test WHERE SOUNDEX(CustName) LIKE '_626'

|||

Sure, but for the simple way of just passing in values and not just assuming that only the first character is changed, e.g. someone put a typo in there writing Amgaret (which would be A526) instead of Margaret you probabyl would need a real fuzzy lookup instead of only replacing the first char.

I don′t know if that fits the original posters need, but sure your solution fits for this particular case. Maybe I misunderstood the situation or the description was too vague to identify the problem.

Jens K. Suessmeyer

http://www.sqlserver2005.de

|||Hi all.

Thank you for answers Smile

I cant use soundex with my problem (example can explain why)

Typical scenario to use "fuzzy lookup" functionality:

0. Initial data loading process inserts into DB table song title and author name: "HUNG UP MADONNA".
There could be some errors during initial data loading process (cause there are no input information quality check).
And song title and author name could be "HANG UP MADONNA" or "HUNG UP MADONA".

1. Manager wants to find this song and view some additional info.
2. He doesn't know what is the right query string "HUNG UP MADONNA" or "MADONNA HUNG UP"
3. Manager enters some similarity coefficient, defining minimal "similarity value".
4. Application retrieves some rows from DB using query string and similarity coefficient.
5. Manager works with the row with max similarity coefficient.

Does anybody know some third party components?
Or how do you think how much can it cost? Its my manager's question Smile.

No comments:

Post a Comment