Thursday 22 October 2015

Elasticsearch crash course

http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html
https://www.elastic.co/guide/index.html
http://www.embulk.org/docs/recipe/scheduled-csv-load-to-elasticsearch-kibana4.html
https://pypi.python.org/pypi/csv2es
https://www.elastic.co/blog/geo-location-and-search

Installing and running ElasticSearch

For the purposes of this tutorial, I'll assume you're on a Linux or Mac environment.
You should also have JDK 6 or above installed.
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.2.tar.gz
tar -zxvf elasticsearch-1.7.2.tar.gz
cd elasticsearch-1.7.2
bin/elasticsearch
You should see something like this in the terminal.
[2015-09-14 15:32:52,278][INFO ][node                     ] [Big Man] version[1.7.2], pid[10907], build[e43676b/2015-09-14T09:49:53Z]
[2015-09-14 15:32:52,279][INFO ][node                     ] [Big Man] initializing ...
[2015-09-14 15:32:52,376][INFO ][plugins                  ] [Big Man] loaded [], sites []
[2015-09-14 15:32:52,426][INFO ][env                      ] [Big Man] using [1] data paths, mounts [[/ (/dev/sdc1)]], net usable_space [8.7gb], net total_space [219.9gb], types [ext3]
Java HotSpot(TM) Server VM warning: You have loaded library /tmp/es/elasticsearch-1.7.2/lib/sigar/libsigar-x86-linux.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
[2015-09-14 15:32:55,294][INFO ][node                     ] [Big Man] initialized
[2015-09-14 15:32:55,294][INFO ][node                     ] [Big Man] starting ...
[2015-09-14 15:32:55,411][INFO ][transport                ] [Big Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.43.172:9300]}
[2015-09-14 15:32:55,428][INFO ][discovery                ] [Big Man] elasticsearch/VKL1HQmyT_KRtmTGznmQyg
[2015-09-14 15:32:59,210][INFO ][cluster.service          ] [Big Man] new_master [Big Man][VKL1HQmyT_KRtmTGznmQyg][Happy][inet[/192.168.43.172:9300]], reason: zen-disco-join (elected_as_master)
[2015-09-14 15:32:59,239][INFO ][http                     ] [Big Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.43.172:9200]}
[2015-09-14 15:32:59,239][INFO ][node                     ] [Big Man] started
[2015-09-14 15:32:59,284][INFO ][gateway                  ] [Big Man] recovered [0] indices into cluster_state
ElasticSearch is now running! You can access it at http://localhost:9200 on your web browser, which returns this:
{
  "status" : 200,
  "name" : "Big Man",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.2",
    "build_hash" : "e43676b1385b8125d647f593f7202acbd816e8ec",
    "build_timestamp" : "2015-09-14T09:49:53Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Indexing Data

We're now going to index some data to our ElasticSearch instance. We'll use the example of a blog engine, which has some posts and comments.
curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }'

curl -XPUT 'http://localhost:9200/blog/post/1' -d '
{ 
    "user": "dilbert", 
    "postDate": "2011-12-15", 
    "body": "Search is hard. Search should be easy." ,
    "title": "On search"
}'


curl -XPUT 'http://localhost:9200/blog/post/2' -d '
{ 
    "user": "dilbert", 
    "postDate": "2011-12-12", 
    "body": "Distribution is hard. Distribution should be easy." ,
    "title": "On distributed search"
}'


curl -XPUT 'http://localhost:9200/blog/post/3' -d '
{ 
    "user": "dilbert", 
    "postDate": "2011-12-10", 
    "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,
    "title": "Lorem ipsum"
}'
To each of these requests, you should have received a response that verifies that the operation was successful, for example:
{"ok":true,"_index":"blog","_type":"post","_id":"1","_version":1}
Let's verify that all operations were successful.
curl -XGET 'http://localhost:9200/blog/user/dilbert?pretty=true'
curl -XGET 'http://localhost:9200/blog/post/1?pretty=true'
curl -XGET 'http://localhost:9200/blog/post/2?pretty=true'
curl -XGET 'http://localhost:9200/blog/post/3?pretty=true'
Note that there are 2 main ways of adding data to ElasticSearch:
  1. json over HTTP
  2. Native client
We'll explore these in greater detail in a subsequent tutorial.

Searching

Let's see if we can retrieve the documents we just added via search.
Find all blog posts by Dilbert:
curl 'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty=true'
This returns the following JSON result:
{
  "took" : 85,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "1",
      "_score" : 1.0, "_source" : 
{ 
    "user": "dilbert", 
    "postDate": "2011-12-15", 
    "body": "Search is hard. Search should be easy." ,
    "title": "On search"
}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "2",
      "_score" : 0.30685282, "_source" : 
{ 
    "user": "dilbert", 
    "postDate": "2011-12-12", 
    "body": "Distribution is hard. Distribution should be easy." ,
    "title": "On distributed search"
}
    }, {
      "_index" : "blog",
      "_type" : "post",
      "_id" : "3",
      "_score" : 0.30685282, "_source" : 
{ 
    "user": "dilbert", 
    "postDate": "2011-12-10", 
    "body": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat" ,
    "title": "Lorem ipsum"
}
    } ]
  }
Nice!
All posts which don't contain the term search:
curl 'http://localhost:9200/blog/post/_search?q=-title:search&pretty=true'
Retrieve the title of all posts which contain search and not distributed:
curl 'http://localhost:9200/blog/post/_search?q=+title:search%20-title:distributed&pretty=true&fields=title'
A range search on postDate:
curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d '
{ 
    "query" : { 
        "range" : { 
            "postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" } 
        } 
    } 
}'
You'll learn more about the various URL query parameters in a separate tutorial.
The usual Lucene query syntax is available either through the JSON query language, or through the query parser.

Shutdown

To shutdown ElasticSearch, from the terminal where you launched elasticsearch, hit Ctrl+C. This will shutdown ElasticSearch cleanly.
ElasticSearch is fairly robust, so even in situations of OS or disk crashes, it is unlikely that ElasticSearch's index will become corrupted.

No comments:

Post a Comment