• Elasticsearch is a distributed, RESTful search and analytics engine capable of handling large volumes of data in near real-time. It’s widely used as a part of the ELK stack, and for full-text search, log and event data analysis, and real-time analysis.

Basic Architecture

  • Node - A single instance of ES. It can be part of a cluster and holds data
  • Cluster - A group of nodes working together, sharing the same cluster name. It distributes the data and operations across nodes
  • Shards - ES splits indices into smaller pieces called shards. Each shard is a self-contained index that can be hosted on any node. Shards enable distributed storage and parallel processing.

Index in Elasticsearch

  • An index in ES is a collection of documents that are related to each other. It is similar to a database in traditional SQL
  • Unlike SQL table, indices are schema-less by default and designed for efficient search with data stored as JSON documents rather than rows
  • An index is split into shards. Each shard is a smaller unit of the index, fully functional, independent sub-index.
  • When data is added into ES, it is already indexed in appropriate data-structures, AKA - dynamic mapping.

Document in ES

  • A document in Elasticsearch is basic unit of information that can be indexed. It is stored in JSON format and contains fields
  • Fields are key-value pairs within documents
  • Documents are indexed in respective data-structures for very fast retrieval. Like texts into inverted indexes and Boolean or number data into BKD tree DS.

Mapping/Schema in ES

  • ES has ability to be schema-less, which means documents are indexed in optimized DS without explicitly specifying how to handle different fields that might occur in the document.
  • When a mapping is defined, it gives more control over the fields and how they are indexed.

Code Snippets