Recently we came across a project where we had to make a domains database searchable with high accuracy. Domain names are special because they don’t have any word separators between words, so normal full-text search is not very helpful to achieve good relevancy and accuracy.
To understand the issue, imagine a table with a full-text index on domain name. If we look for “Consultant” in domains we won’t find SphinxConsultant.com in results. The problem here is that, “sphinxconsultant” is treated as complete keyword. Ideally “sphinx” and “consultant” should have been treated as separate keywords. Even with LIKE search we can’t have good accuracy, as looking for “sphinx consultant” will not bring our desired domain in results.
Here the Sphinx search came as rescuer with its new feature named “WordBreaker”. The word breaker tries to break the complete word into multiple keywords and make these searchable. This is done via a dictionary which can be build using “indexer” and the index. The wordbreaker will tell index that in “SphinxConsultant.com” there are different three words “sphinx”, “consultant” and “com” so these words become searchable.
So now searching in urls, domain names and other similar situations is as easy as searching other content with sphinx.