Advanced sorting using Sphinx Search Expressions

Sphinx Search allows use of custom expressions, where we can have internal attributes, conditions, arithmetic operators, and functions to calculate custom values for each document/record. It then allows sorting these documents/results on the calculated value by using SPH_SORT_EXPR sort mode. Here we will demonstrate how expressions can be used to implement flexible sorting to accommodate complex business rules for sorting.

The Problem

In reality we don’t always want to sort search results on pure relevance or some fixed criteria. This situation is very common in eCommerce, where attributes like price, popularity, origin, page views etc are also important factors in search results. Imagine an eCommerce site which wants to promote best selling products and put these before others in search results, but don’t want to ignore relevancy. To make sure that the results are still relevant, it only want to give 20% advantage to products which are best sellers. Be sure that best seller here is not just an on/off bit, but it is actual number of times the product has been sold. This number can vary from 0 to 1000s for same search terms. To achieve this we can’t just sort on relevance and order count, because it will sort on relevance first and than order count.

The Solution

Here the Sphinx Search expressions can help. Assume we have a “int” attribute in our index “sale_count” for each product and it is the number of orders placed for this product.  The expression to sort results 80% on relevance and 20% on sales_count the expression will be like

((weight/{max_weight}) * 80) + ((sale_count/{max_sale_count}) * 20)

Note that {max_weight} and {max_sale_count} are place holders, these are not variables of Sphinx Search. “weight” is internal variable and “sale_count” is available in index as an attribute. It seems pretty simple, but calculating max_weight and max_sale_count is the real trick here. “max_weight” is the maximum weight assigned to product in results for a term. The weight depends on matching mode, ranker, field weights, number of words in query etc, and its calculated internally by Sphinx. If we go for calculating max_weight manually by using formulas like “maxPossibleWeight = wordCount*totalWeight*1000+999”, it becomes too large in ranger and not accurate as well. It practically reduces the weight/{max_weight} factor. So the best way of getting max_weight which is applicable to given search term, is to do an extra Sphinx query and sort it purely on relevance. Then get the first match of the query and use its weight value for max_weight. This value will be 100% accurate and will make the expression accurate as well. Same is the case for max_sale_count. We need to find the product with maximum number of sale count which is matched for this particular search term. Once we can get value of max_sale_count and max_weight by doing extra queries use these these values in expression. Now sorting on expression will give result which are sorted 80% on relevance and 20% on its sale count. See the example php code and results below.

require ( "sphinxapi.php" ); 
$cl = new SphinxClient(); 
$cl->SetServer('localhost', 9312);

// This is important here
$cl->SetArrayResult(true);
$cl->SetMatchMode(SPH_MATCH_EXTENDED);

$query = '@(name,description) ' . $cl->EscapeString($_GET['q']);
$index = 'products';

// Find product with max weight
$cl->SetSortMode(SPH_SORT_RELEVANCE);
$cl->SetLimits(0, 1);

$maxWeight = $maxSaleCount = 1;
$result = $cl->Query($query, $index);
if (isset($result['matches'])) {
    $maxWeight = $result['matches'][0]['weight'];
}

$cl->SetSortMode(SPH_SORT_EXTENDED, 'sale_count DESC');
$cl->SetLimits(0, 1);

$result = $cl->Query($query, $index);

if (isset($result['matches'])) {
    $maxSaleCount = $result['matches'][0]['attrs']['sale_count'];
    $maxSaleCount = $maxSaleCount > 0 ? $maxSaleCount : 1
}

// Now the actual query to fetch results
$expression = '((@weight/' . $maxWeight . ') * 80) + ((sale_count/' . $maxSaleCount . ') * 20)';
$cl->SetSortMode(SPH_SORT_EXPR, $expression);
$cl->SetLimits(0, 10);
$results = $cl->Query($query, $index);

print_r($results);
Note: Please mind the > if copying the code. Thanks

The matches array will look like array below. Here @expr contains final value of expression for each result and results are sorted on custom expression values.

[matches] => Array
(
	[0] => Array
		(
			[id] => 21615
			[weight] => 2593
			[attrs] => Array
				(
					[sale_count] => 108
					[@expr] => 80.122604370117
				)

		)

	[1] => Array
		(
			[id] => 21617
			[weight] => 2593
			[attrs] => Array
				(
					[sale_count] => 56
					[@expr] => 80.063575744629
				)

		)

	[2] => Array
		(
			[id] => 21618
			[weight] => 2593
			[attrs] => Array
				(
					[sale_count] => 50
					[@expr] => 80.056762695312
				)

		)

	[3] => Array
		(
			[id] => 21616
			[weight] => 2593
			[attrs] => Array
				(
					[sale_count] => 36
					[@expr] => 80.040870666504
				)

		)

	[4] => Array
		(
			[id] => 21619
			[weight] => 2593
			[attrs] => Array
				(
					[sale_count] => 30
					[@expr] => 80.034057617188
				)

		)

	[5] => Array
		(
			[id] => 63688
			[weight] => 2591
			[attrs] => Array
				(
					[sale_count] => 80
					[@expr] => 80.029113769531
				)

		)

	[6] => Array
		(
			[id] => 315
			[weight] => 2588
			[attrs] => Array
				(
					[sale_count] => 118
					[@expr] => 79.979698181152
				)

		)

	[7] => Array
		(
			[id] => 63672
			[weight] => 2591
			[attrs] => Array
				(
					[sale_count] => 34
					[@expr] => 79.976898193359
				)

		)

	[8] => Array
		(
			[id] => 67359
			[weight] => 2591
			[attrs] => Array
				(
					[sale_count] => 25
					[@expr] => 79.966674804688
				)

		)

	[9] => Array
		(
			[id] => 63654
			[weight] => 2591
			[attrs] => Array
				(
					[sale_count] => 24
					[@expr] => 79.965545654297
				)

		)

)

You can notice that the many products have exact same “weight” which is relevance but product with high sale count is placed before the products with lower sale count. Let me know your thoughts and comments 🙂

4 comments on “Advanced sorting using Sphinx Search Expressions

  1. Thanks for this example. I have been trying to get it to work. One thing… In this line, what is “page_views.” I don’t see it mentioned anywhere else.

    $expression = ‘((@weight/’ . $maxWeight . ‘) * 80) + ((page_views/’ . $maxSaleCount . ‘) * 20)’;

    Thanks!

    Craig

    1. Thanks Craig for finding this mistake. The “page_views” should have been sale_count so the expression will look like
      $expression = ‘((@weight/’ . $maxWeight . ‘) * 80) + ((sale_count/’ . $maxSaleCount . ‘) * 20)’;
      So the 80% is relevance calculated by Sphinx and 20% is weight for sale_count of that product.

  2. Awesome!! Thanks so much for taking the time to answer and correct this. It is correctly working on my site now which is awesome… If I need any further help, I will contact you for your services. Thanks for freely offering this nugget of wisdom. Well done.

Leave a Reply