Our client has database of 100,000,000+ patents applications records and 400,000,000+ trademark applications. They needed to make all this information searchable for users. They were using mysql for search but that was very slow and inaccurate. Client attempted to move this search to Sphinx Search on their own. The amount of data was huge and their indexing would run for many days but could not complete. Due to this the whole product was at risk. After their failed attempt client reached to us for expert advice on Sphinx Search. Our team performed an audit on Sphinx Search implementation. We identified many issues in configurations and database structure which was causing slow indexing and crashing. We rewrote the index definitions from scratch so it can index the data of this scale. We used distributed indexing to distribute load on multiple indices and index data in parallel. Also used sql_joined_field feature to further optimize the queries for indexing. We introduced delta indexing so data becomes available more frequently. Also performed relevancy tuning for queries so they find results more accurately and fast. This data was stored in multiple database and queries were slow. We optimized settings for mysql to speed up the queries. The search can be seen in action here.


Waseem is consultant for Elastic Stack. He is Elastic Certified Engineer. Has years of experience with Elasticsearch, Solr, Wazuh, Sphinx Search, Manticore Search, OpenSearch and full text searching.