English-posts · Programmering

Don’t sort on tokenized strings in Solr

15. desember 2009 · 1 Kommentar

Apache Solr is a very powerful index and search engine. Unfortunately it does have some flaws, at least I think this issue is somehow not “by design”.

If you are going to use a field to sort on, make sure you use one of the native data types in Solr, and don’t enable any tokenizer on that data type. If you do, you might end up with HTTP 500 Internal Server Error and error log messages like this:

SEVERE: java.lang.RuntimeException: there are more terms than documents in field “title”, but it’s impossible to sort on tokenized fields

I found out that I had been using a data type with some filters and a tokenizer on a couple of fields, quite unnecessary since I don’t do any search on them. I have another field that I do search on. I only use these fields for display and sort.

Keep it simple. Use “string” for strings you don’t have to search on. If you have to do both search and sort on a field, make two fields. For example, name one of them like “title.sort”.