Elasticsearch – Sorting and paging nested documents

We use elasticsearch to support our reporting backend. One of the primary requirements are paging and sorting the results. Paging the documents is very easy and works out of box. But sorting nested documents can become tricky. I was tasked to sort the documents with properties not in elasticsearch. As elasticsearch is a key value pair store. The documents are keyed by the id and value is nesed document.

Document structure. Lets suppose we have two documents. Given below are two documents and their properties represented in nested documents. We have to sort the two documents using their properties.

{ id : 123,
  data : {
   {dataid: 1, value: 2},
   {dataid: 2, value: 3}
  }
}
{ id : 124,
  data : {
    {dataid: 1, value: 4},
    {dataid: 2, value: 5}
   }
 }

Say suppose we have a properties table

 dataid datatext property1 property2
   1      "One"     2.5     2.6
   2      "Two"     3.4     5.0

If we just want to sort the results using dataid it should be pretty straight forward. For more information please refer Elasticsearch Guide

{
 "from": 0,
 "size": 50,
 "sort": [
 {
   "data.datavalue": {
       "order": "asc",
       "nested_filter": {
           "term": {
              "data.dataid": "1"
            }
         }
       }
     }
  ],
 "filter": {
     }
 }

But if you have to sort by something other than the dataid for example datatext or property1. We could use custom score to sort. The script tag can generated dynamically. For example to sort on property1 then dataid 2 should before dataid 1. For paging, just specify from and size in the query. The query would something like this.

{
 "from": 0,
 "size": 50,
 "sort": [
   {
      "_score": {
        "order": "asc"
      }
    }
 ],
"query": {
   "nested": {
      "path": "data",
      "query": {
      "custom_score": {
          "query": {
             "bool": {
              "should": [
                  {
                     "term": {
                        "data.dataid": 1
                       }
                  },
                  {
                    "term": {
                       "data.dataid": 2
                      }
                  }
              ]
          }
       },
        "script": "if(doc['data.dataid'].value=='1') { return 2 }
         else if(doc['data.dataid'].value=='2') { return 1 } else { return 0 }"
       }
     }
   }
 }

Similarly if we had to sort using the datatext we could generate the script tag dynamically and swap the script tag with the necessary sort. This way we need not store all the properties into the every document. We can just generate the script tag dynamically when needed.

Feel free to contact me @abhishek376 or leave a comment below.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s