sequel-elasticsearch

A plugin for the Sequel gem to sync your Models to Elasticsearch

View project on GitHub

Sequel::Elasticsearch

Sequel::Elasticsearch allows you to transparently mirror your database, or specific tables, to Elasticsearch. It’s especially useful if you want the power of search through Elasticsearch, but keep the sanity and structure of a relational database.

Build Status Maintainability Test Coverage

Installation

Add this line to your application’s Gemfile:

gem 'sequel-elasticsearch'

And then execute:

$ bundle

Or install it yourself as:

$ gem install sequel-elasticsearch

Usage

Require the gem with:

require 'sequel/plugins/elasticsearch'

You’ll need an Elasticsearch cluster to sync your data to. By default the gem will try to connect to http://localhost:9200. Set the ELASTICSEARCH_URL ENV variable to the URL of your cluster.

This is a Sequel plugin, so you can enable it DB wide:

Sequel::Model.plugin :elasticsearch

Or per model:

Document.plugin Sequel::Elasticsearch

# or

class Document < Sequel::Model
  plugin :elasticsearch
end

There’s a couple of options you can set:

Sequel::Model.plugin :elasticsearch,
  elasticsearch: { log: true }, # Options to pass the the Elasticsearch ruby client
  index: 'all-my-data', # The index in which the data should be stored. Defaults to the table name associated with the model
  type: 'is-mine' # The type in which the data should be stored.

And that’s it! Just transact as you normally would, and your records will be created and updated in the Elasticsearch cluster.

Indexing

Ensure that you create the index mappings for your data before using this plugin, otherwise you might get some weird results.

The records will by default be indexed using the values call of the model. Should you need to customize what’s indexed, you can define a indexed_values method (or as_indexed_json method if you prefer the Rails way).

Searching

Your model is now searchable through Elasticsearch. Just pass down a string that’s parsable as a query string query.

Document.es('title:Sequel')
Document.es('title:Sequel AND body:Elasticsearch')

The result from the es method is an enumerable containing Sequel::Model instances of your model:

results = Document.es('title:Sequel')
results.each { |e| p e }
# Outputs
# #<Document @values={:id=>1, :title=>"Sequel", :body=>"Document 1"}>
# #<Document @values={:id=>2, :title=>"Sequel", :body=>"Document 2"}>

The result also contains the meta info about the Elasticsearch query result:

results = Document.es('title:Sequel')
p results.count # The number of documents included in this result
p results.total # The total number of documents in the index that matches the search
p results.timed_out # If the search timed out or not
p results.took # How long, in miliseconds the search took

You can also use the scroll API to search and fetch large datasets:

# Get a dataset that will stay consistent for 5 minutes and extend that time with 1 minute on every iteration
scroll = Document.es('test', scroll: '5m')
p scroll_id # Outputs the scroll_id for this specific scrolling snapshot
puts "Found #{scroll.count} of #{scroll.total} documents"
scroll.each { |e| p e }
while (scroll = Document.es(scroll, scroll: '1m')) && scroll.empty? == false do
  puts "Found #{scroll.count} of #{scroll.total} documents"
  scroll.each { |e| p e }
end

Import

You can import the whole dataset, or specify a dataset to be imported. This will create a new, timestamped index for your dataset, and import all the records from that dataset into the index. An alias will be created (or updated) to point to the newly created index.

Document.import! # Import all the Document records. Use the default settings.

Document.import!(dataset: Document.where(active: true)) # Import all the active Document records

Document.import!(
    index: 'active-documents', # Use the active-documents index
    dataset: Document.where(active: true), # Only index active documents
    batch_size: 20 # Send documents to Elasticsearch in batches of 20 records
)

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/jrgns/sequel-elasticsearch.

Features that needs to be built:

  • An es method to search through the data on the cluster.
  • Let es return an enumerator of Sequel::Model instances.
  • A rake task to create or suggest mappings for a table.

License

The gem is available as open source under the terms of the MIT License.