Elasticsearch is a distributed, RESTful search and analytics engine capable of handling large volumes of data in near real-time. It’s widely used as a part of the ELK stack, and for full-text search, log and event data analysis, and real-time analysis.
Basic Architecture
Node - A single instance of ES. It can be part of a cluster and holds data
Cluster - A group of nodes working together, sharing the same cluster name. It distributes the data and operations across nodes
Shards - ES splits indices into smaller pieces called shards. Each shard is a self-contained index that can be hosted on any node. Shards enable distributed storage and parallel processing.
Index in Elasticsearch
An index in ES is a collection of documents that are related to each other. It is similar to a database in traditional SQL
Unlike SQL table, indices are schema-less by default and designed for efficient search with data stored as JSON documents rather than rows
An index is split into shards. Each shard is a smaller unit of the index, fully functional, independent sub-index.
When data is added into ES, it is already indexed in appropriate data-structures, AKA - dynamic mapping.
Document in ES
A document in Elasticsearch is basic unit of information that can be indexed. It is stored in JSON format and contains fields
Fields are key-value pairs within documents
Documents are indexed in respective data-structures for very fast retrieval. Like texts into inverted indexes and Boolean or number data into BKD tree DS.
Mapping/Schema in ES
ES has ability to be schema-less, which means documents are indexed in optimized DS without explicitly specifying how to handle different fields that might occur in the document.
When a mapping is defined, it gives more control over the fields and how they are indexed.