Tags

, , ,


Introduction

I needed to run a diagnostic test to see if I could use Elastic Search on the products that I’m working on currently, so I’m putting up this brief resource on my experience with Elastic Search for the purposes of testing. The scripts that I used are available at the end of this article for download.

Machine Configuration:-

Host System Mac Running Elastic Search
Guest System Virtual Box networked via Bridged Networking

Test Data Set & Elastic Search Configuration:-

One Million Indexes Written using two concurrent writers. One Writing index (zero to five hundred thousand). Second Writing index(five hundred thousand to one million)
Indexes set with a duplicate limit 1 i.e. feed each other with your data, multimaster configuration
Tool written in Python to Insert Mock Data Containing Searchable information ID, Numeral Text etc.

Assumptions:

  • Test Tool written as a wrapper over curl jusing JSON api. Not intended for blazing fast speed. Things can be faster via native library and even more so with native Java protocol, but explicit is alway better than implicit and premature optimization is the root of all evil.
  • Version of Elastic Search Used is 0.17.8

Scenarios Tested:

  • Basic Replicating Scenario. Stats Attached in Excel File. I would love to run anyone through this.

Everything went Smooth. No Issues Encountered. Please See Attached xlsx file for results

  • Virtual Machine Powered off and restarted and relatched into Cluster

Everything resynced fine and very fast. No issues Encountered.

Other Observations:-

Approx 50 ms querying times on string search exact with concurrent index being manipulated by two concurrent threads writing to index.
Approx 150 ms querying times on search containing substring and 3 OR’s being manipulated by two concurrent threads writing to index.

Other Information:-

http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
Highlights SOLR’s deficiency when it comes to realtime writes and reads decreasing performance of single Index.

http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html
Useful Video that highlights Elastic Search Internals.

Appendix:

Test Tool Command Line Options:

Usage: writer.py [options]

Options:
  -h, --help            show this help message and exit
  -o HOST, --host=HOST  ip/hostname of es instance e.g.
                        stor.mystormachine.com, 192.168.0.3, localhost, etc
  -p PORT, --port=PORT  port of stor instance use 9200 if unsure
  -n NAME, --name=NAME  specifies namespace e.g. test
  -l LOWER, --lower=LOWER
                        specifies lower number of range
  -u UPPER, --upper=UPPER
                        specifies upper number of range
  -t TEST, --test=TEST  true means to run command, ignore means ignore

Links to Scripts / Utilities from this post

http://178-77-103-161.kundenadmin.hosteurope.de:3000/attachments/9/IndexWriterCurlWrapperScript.zip

http://178-77-103-161.kundenadmin.hosteurope.de:3000/attachments/10/elasticsearch.yml

http://178-77-103-161.kundenadmin.hosteurope.de:3000/attachments/11/SummaryIndex.xlsx

P.S. I’m copying this from my own wiki, so if something is formatted to look overly dramatic, please forgive me 🙂

Advertisements