Setting up spellchecking in Solr is a little complicated - you have to set up the spellcheck component, define a dictionary, and add the component to the necessary search handlers. Even then, Sunspot doesn’t support spellchecking by default. Here I’ll explain how to set up a basic spellchecking system using the out-of-the-box solrconfig that comes with Sunspot, and give you some code I wrote that provides an interface between Sunspot and Solr’s spellchecking system.

Setting up spell checking

There’s a couple steps to this. First, you should decide what you’re going to want to spell check on. You might want to auto-correct all English words in a search, but more likely you want to help a user find something they might have misspelled. For example, somebody’s name or a product name. To do that, pop open solrconfig.xml and find the searchComponent definition for “spellcheck.” Change the “field” from “name” to a field that is actually in your application. Fields set up by Sunspot are dynamic fields, so it’s probably the name you defined in your “searchable” block followed by a postfix delineating what kind of field it is. For example, let’s say you have a User model that looks like this:

class User < ActiveRecord::Base
  searchable do
    text :username
  end
  ...
end

Then in the “field” segment, you’d want to put in “username_text”, like so:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpell</str>
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">username_text</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
      <str name="buildOnCommit">true</str>
    </lst>
    ...
</searchComponent>

Notice we added the “buildOnCommit” parameter, which will cause the dictionary to re-build when new records with the username field are committed. Unless you want to manually re-build the dictionary from time to time, you probably want this parameter or “buildOnOptimize”.

Finally, you’ll want to set your search handlers to use the spellcheck component so that it returns spelling suggestions along with your result set. Add “spellcheck” to the “last-components” array at the end of these two search handlers, along with a default option of “spellcheck=true”. You probably also want to set “a default count, set “collate=true” as a default, which will generate a new search string made up of the top spellcheck suggestion for each word in the query. It should look something like this:

<requestHandler name="standard" class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck">true</str>
  </lst>
  <arr name="last-components">
    <str>spellcheck</str>
  </arr>
</requestHandler>

<requestHandler name="dismax" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="defType">dismax</str>
    <str name="echoParams">explicit</str>
    <float name="tie">0.01</float>
    <str name="mm">
      2&lt;-1 5&lt;-2 6&lt;90%
   </str>
    <int name="ps">100</int>
    <str name="q.alt">*:*</str>
    <str name="f.name.hl.fragsize">0</str>
    <str name="f.name.hl.alternateField">name_text</str>
    <str name="f.text.hl.fragmenter">regex</str>
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck">true</str>
  </lst>
  <arr name="last-components">
    <str>spellcheck</str>
  </arr>
</requestHandler>

We actually use a few more options to specify returning only more popular results, and limiting the number of spelling suggestions. Our setup looks more like this:

<requestHandler name="dismax" class="solr.SearchHandler">
  <lst name="defaults">
    ...
    <str name="spellcheck.dictionary">default</str>
    <str name="spellcheck.onlyMorePopular">true</str>
    <str name="spellcheck.extendedResults">false</str>
    <str name="spellcheck.count">3</str>
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck">true</str>
  </lst>
  <arr name="last-components">
    <str>spellcheck</str>
  </arr>
</requestHandler>

For a complete list of what those options do, see the SpellCheckComponent description on the Solr Wiki.

Now, to construct the dictionary for the first time, go to http://yoursearchserver.com/solr/spell?q=*&spellcheck.build=true - it might take a bit to build the dictionary, but once that request completes it should be ready. Finally, perform a normal search through the solr admin interface at http://yoursearchserver.com/solr/admin - preferably a misspelled version of a username. Notice the block at the end of the end of the returned document? It should look something like this (I searched for “delll ultrasharp”):

<lst name="spellcheck">
  <lst name="suggestions">
    <lst name="delll">
      <int name="numFound">2</int>
      <int name="startOffset">0</int>
      <int name="endOffset">5</int>
      <arr name="suggestion">
        <str>Jello</str>
        <str>Tell</str>
      </arr>
    </lst>
    <lst name="ultrasharp">
      <int name="numFound">3</int>
      <int name="startOffset">6</int>
      <int name="endOffset">16</int>
      <arr name="suggestion">
        <str>ultraDice</str>
        <str>UltraBall</str>
        <str>UltraDeep</str>
      </arr>
  </lst>
  <str name="collation">Jello ultraDice</str>
</lst>

There’s the spelling suggestions, and even the collated search string. But how do we access that from Sunspot in our Rails application?

Integrating with Sunspot

Sunspot doesn’t support spelling suggestions out of the box. So I wrote a really quick interface to the spellchecker and added it to the searching DSL. With the configuration changes above to make sure you’re indexing search terms, drop this code in something like ‘/lib/sunspot_spellcheck.rb’ and require it in an initializer.

module Sunspot
  module Query
    class Spellcheck < Connective::Conjunction
      attr_accessor :options

      def initialize(options = {})
        @options = options
      end

      def to_params
        options = {}
        @options.each do |key, val|
          options["spellcheck." + Sunspot::Util.method_case(key)] = val
        end
        { :spellcheck => true }.merge(options)
      end
    end
  end
end

module Sunspot
  module Query
    class CommonQuery
      def spellcheck options = {}
        @components << Spellcheck.new(options)
      end
    end
  end
end

module Sunspot
  module Search
    class AbstractSearch
      attr_accessor :solr_result

      def raw_suggestions
        ["spellcheck", "suggestions"].inject(@solr_result){|h,k| h && h[k]}
      end

      def suggestions
        suggestions = ["spellcheck", "suggestions"].inject(@solr_result){|h,k| h && h[k]}
        return nil unless suggestions.is_a?(Array)

        suggestions_hash = {}
        index = -1
        suggestions.each do |sug|
          index += 1
          next unless sug.is_a?(String)
          break unless suggestions.count > index + 1
          suggestions_hash[sug] = suggestions[index+1].try(:[], "suggestion") || suggestions[index+1]
        end
        suggestions_hash
      end

      def all_suggestions
        suggestions.inject([]){|all, current| all += current}
      end

      def collation
        suggestions.try(:[], "collation")
      end
    end
  end
end

module Sunspot
  module DSL
    class StandardQuery
      def spellcheck options = {}
        @query.spellcheck(options)
      end
    end
  end
end

module Sunspot
  module Util
    class<<self
      def method_case(string_or_symbol)
        string = string_or_symbol.to_s
        first = true
        string.split('_').map! { |word| word = first ? word : word.capitalize; first = false; word }.join
      end
    end
  end
end

Now when you perform a search you can instruct it to return spelling suggestions like so:

@search = User.search do
  keywords params[:q]
  spellcheck
end

This will ensure the “spellcheck=true” param is passed into the Solr request. This should be unnecessary, since we put that in the defaults for standard and disMax searches in our Solrconfig.xml above. However, there’s more: you can pass options to the spellchecker by passing a hash to the spellcheck method. That looks something like this:

@search = User.search do
  keywords params[:q]
  spellcheck :only_more_popular => true, :count => 5
end

Now the spellchecker will only return more popular suggestions, and five of them, regardless of the defaults in set in Solrconfig.xml . Handy, no?

To access the spelling suggestions in the resulting search object, call the “suggestions” method. It will return a hash whose keys are the search terms and values are an array of suggestions. It will look something like this:

# after searching for "angr brds"
@search.suggestions
#=>{ "angr" => ["angry", "tanga","bang"], "brds" => ["birds", "words", "nerds"], "collation" => "angry birds"}

This way you can easily parse through the terms in the query and get at the suggestions. Also, if you just want the collation (so you could suggest an alternate search similar to Google’s “Did you mean?” feature), you can call the “collation” method, and if the search returned a collated suggestion, it will be returned as a string.

@search.collation
#=> "angry birds"

We’re working on slicing and dicing dictionaries and different types of searches on our site to try and put the best near-matches in front of the user, but hopefully this will get you started. Please leave any questions / comments below!