Setting up spellchecking in Solr is a little complicated - you have to set up the spellcheck component, define a dictionary, and add the component to the necessary search handlers. Even then, Sunspot doesn’t support spellchecking by default. Here I’ll explain how to set up a basic spellchecking system using the out-of-the-box solrconfig that comes with Sunspot, and give you some code I wrote that provides an interface between Sunspot and Solr’s spellchecking system.
Setting up spell checking
There’s a couple steps to this. First, you should decide what you’re going to want to spell check on. You might want to auto-correct all English words in a search, but more likely you want to help a user find something they might have misspelled. For example, somebody’s name or a product name. To do that, pop open solrconfig.xml and find the searchComponent definition for “spellcheck.” Change the “field” from “name” to a field that is actually in your application. Fields set up by Sunspot are dynamic fields, so it’s probably the name you defined in your “searchable” block followed by a postfix delineating what kind of field it is. For example, let’s say you have a User model that looks like this:
class User < ActiveRecord::Base searchable do text :username end ... end
Then in the “field” segment, you’d want to put in “username_text”, like so:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textSpell</str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">username_text</str> <str name="spellcheckIndexDir">./spellchecker</str> <str name="buildOnCommit">true</str> </lst> ... </searchComponent>
Notice we added the “buildOnCommit” parameter, which will cause the dictionary to re-build when new records with the username field are committed. Unless you want to manually re-build the dictionary from time to time, you probably want this parameter or “buildOnOptimize”.
Finally, you’ll want to set your search handlers to use the spellcheck component so that it returns spelling suggestions along with your result set. Add “spellcheck” to the “last-components” array at the end of these two search handlers, along with a default option of “spellcheck=true”. You probably also want to set “a default count, set “collate=true” as a default, which will generate a new search string made up of the top spellcheck suggestion for each word in the query. It should look something like this:
<requestHandler name="standard" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="spellcheck.collate">true</str> <str name="spellcheck">true</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler> <requestHandler name="dismax" class="solr.SearchHandler"> <lst name="defaults"> <str name="defType">dismax</str> <str name="echoParams">explicit</str> <float name="tie">0.01</float> <str name="mm"> 2<-1 5<-2 6<90% </str> <int name="ps">100</int> <str name="q.alt">*:*</str> <str name="f.name.hl.fragsize">0</str> <str name="f.name.hl.alternateField">name_text</str> <str name="f.text.hl.fragmenter">regex</str> <str name="spellcheck.collate">true</str> <str name="spellcheck">true</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
We actually use a few more options to specify returning only more popular results, and limiting the number of spelling suggestions. Our setup looks more like this:
<requestHandler name="dismax" class="solr.SearchHandler"> <lst name="defaults"> ... <str name="spellcheck.dictionary">default</str> <str name="spellcheck.onlyMorePopular">true</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">3</str> <str name="spellcheck.collate">true</str> <str name="spellcheck">true</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
For a complete list of what those options do, see the SpellCheckComponent description on the Solr Wiki.
Now, to construct the dictionary for the first time, go to http://yoursearchserver.com/solr/spell?q=*&spellcheck.build=true
- it might take a bit to build the dictionary, but once that request completes it should be ready. Finally, perform a normal search through the solr admin interface at http://yoursearchserver.com/solr/admin
- preferably a misspelled version of a username. Notice the block at the end of the end of the returned document? It should look something like this (I searched for “delll ultrasharp”):
<lst name="spellcheck"> <lst name="suggestions"> <lst name="delll"> <int name="numFound">2</int> <int name="startOffset">0</int> <int name="endOffset">5</int> <arr name="suggestion"> <str>Jello</str> <str>Tell</str> </arr> </lst> <lst name="ultrasharp"> <int name="numFound">3</int> <int name="startOffset">6</int> <int name="endOffset">16</int> <arr name="suggestion"> <str>ultraDice</str> <str>UltraBall</str> <str>UltraDeep</str> </arr> </lst> <str name="collation">Jello ultraDice</str> </lst>
There’s the spelling suggestions, and even the collated search string. But how do we access that from Sunspot in our Rails application?
Integrating with Sunspot
Sunspot doesn’t support spelling suggestions out of the box. So I wrote a really quick interface to the spellchecker and added it to the searching DSL. With the configuration changes above to make sure you’re indexing search terms, drop this code in something like ‘/lib/sunspot_spellcheck.rb’ and require it in an initializer.
module Sunspot module Query class Spellcheck < Connective::Conjunction attr_accessor :options def initialize(options = {}) @options = options end def to_params options = {} @options.each do |key, val| options["spellcheck." + Sunspot::Util.method_case(key)] = val end { :spellcheck => true }.merge(options) end end end end module Sunspot module Query class CommonQuery def spellcheck options = {} @components << Spellcheck.new(options) end end end end module Sunspot module Search class AbstractSearch attr_accessor :solr_result def raw_suggestions ["spellcheck", "suggestions"].inject(@solr_result){|h,k| h && h[k]} end def suggestions suggestions = ["spellcheck", "suggestions"].inject(@solr_result){|h,k| h && h[k]} return nil unless suggestions.is_a?(Array) suggestions_hash = {} index = -1 suggestions.each do |sug| index += 1 next unless sug.is_a?(String) break unless suggestions.count > index + 1 suggestions_hash[sug] = suggestions[index+1].try(:[], "suggestion") || suggestions[index+1] end suggestions_hash end def all_suggestions suggestions.inject([]){|all, current| all += current} end def collation suggestions.try(:[], "collation") end end end end module Sunspot module DSL class StandardQuery def spellcheck options = {} @query.spellcheck(options) end end end end module Sunspot module Util class<<self def method_case(string_or_symbol) string = string_or_symbol.to_s first = true string.split('_').map! { |word| word = first ? word : word.capitalize; first = false; word }.join end end end end
Now when you perform a search you can instruct it to return spelling suggestions like so:
@search = User.search do keywords params[:q] spellcheck end
This will ensure the “spellcheck=true” param is passed into the Solr request. This should be unnecessary, since we put that in the defaults for standard and disMax searches in our Solrconfig.xml above. However, there’s more: you can pass options to the spellchecker by passing a hash to the spellcheck method. That looks something like this:
@search = User.search do keywords params[:q] spellcheck :only_more_popular => true, :count => 5 end
Now the spellchecker will only return more popular suggestions, and five of them, regardless of the defaults in set in Solrconfig.xml . Handy, no?
To access the spelling suggestions in the resulting search object, call the “suggestions” method. It will return a hash whose keys are the search terms and values are an array of suggestions. It will look something like this:
# after searching for "angr brds" @search.suggestions #=>{ "angr" => ["angry", "tanga","bang"], "brds" => ["birds", "words", "nerds"], "collation" => "angry birds"}
This way you can easily parse through the terms in the query and get at the suggestions. Also, if you just want the collation (so you could suggest an alternate search similar to Google’s “Did you mean?” feature), you can call the “collation” method, and if the search returned a collated suggestion, it will be returned as a string.
@search.collation #=> "angry birds"
We’re working on slicing and dicing dictionaries and different types of searches on our site to try and put the best near-matches in front of the user, but hopefully this will get you started. Please leave any questions / comments below!