Thinking About Elasticsearch Performance

Thinking About Elasticsearch Performance

Elasticsearch is an open-source search engine (https://github.com/elastic/elasticsearch). Search is tightly coupled with most web services—be it e-commerce or social—and SQL eventually hits limits either in functionality or performance. That is where Elasticsearch shines. After using it on a project, I hit a few performance issues worth noting.

Basic flow

Like an RDBMS, you define an index (mapping + data) and ingest documents. To try it quickly, I made a Docker setup: https://github.com/hirotoyoshidome/elasticsearch-query-etc. You can query via curl or use Kibana (https://github.com/elastic/kibana), which returns JSON results that are easy to consume. Official clients exist for major languages: https://www.elastic.co/guide/en/elasticsearch/client/index.html.

Pitfalls I encountered

1. Deeply nested mappings – Elasticsearch lets you nest structures, which is tempting when modeling parent-child relationships:

{
  "store_id": 1,
  "store_name": "sample",
  "employee": [
    {"name": "Mike", "age": 30},
    {"name": "John", "age": 25}
  ],
  "location": {"prefecture": "Tokyo", "city": "Minato-ku"}
}

As depth grows, query performance drops sharply once the document count increases. Changing mappings later is painful, so avoid deep nesting. If you need child hits, inner_hits retrieves them, but it also slows down as data grows.

2. Aggregating without filtering first – Think SQL: aggregationsGROUP BY. Filtering before aggregating performs better. Elasticsearch lets you filter after the aggregation (akin to HAVING), but it is slower. Apply filters first, then aggregate.

3. Embedding parameters directly in scripts – You can sort via scripts written in Painless. If you inline parameters inside the script source, Elasticsearch recompiles the script every time, hurting performance and potentially hitting compilation limits. Instead, pass parameters via the params mechanism so the script body stays constant and is reused.

Closing

These are simple gotchas, but they bit me. Elasticsearch is powerful and well-documented, so read the docs and plan ahead. You can even handle synonym searches (e.g., “ジーパン/デニム/ジーンズ”) via synonym dictionaries—something SQL struggles with. I am still learning, but I hope this helps.