How We Brought In Elasticsearch

2022-10-26

#Elasticsearch
#AWS

I want to share the story of how we replaced the search logic we had built for work with Elasticsearch.

Before adopting Elasticsearch, we were composing the product search logic with SQL on top of an RDB.

As the number of products grew, however, several issues surfaced:

the need for more flexible search options
the desire to implement logic tailored to user needs
declining search performance

To overcome these challenges we decided to bring in Elasticsearch.

In this post I will walk through what we actually did during the recent Elasticsearch rollout and the pitfalls we ran into.

Infrastructure required for Elasticsearch

We had already standardized on AWS for infrastructure. From DNS to servers and databases, everything ran on AWS services.

When we first decided to adopt Elasticsearch, we planned to use AWS OpenSearch.

AWS OpenSearch is an open-source project that forked from Elasticsearch, and it is a different product. The AWS documentation states that the OpenSearch roadmap is independent from the Elasticsearch roadmap, so I believe it is best to treat them as separate products over the long term.

https://aws.amazon.com/jp/what-is/opensearch/

Initially we evaluated building on OpenSearch, but during our investigation we discovered that the Elasticsearch plug-in we wanted was not available on OpenSearch. As a result we abandoned that plan and chose to run Elasticsearch directly on EC2.

If you build on EC2, you also have to configure the cluster and monitoring yourself, so the initial cost felt relatively high.

It is important to confirm ahead of time whether the plug-ins you want are compatible with OpenSearch and to assess whether you truly want to run the cluster on EC2.

Side note: AWS OpenSearch also offers a GUI called OpenSearch Dashboards, similar to Kibana, so you can work visually.

What we wanted to achieve with this rollout

Here is what we were hoping for by bringing in Elasticsearch.

While examining search data, we noticed many cases where fluctuations in search keywords prevented relevant results from showing up.

Examples:

Searching for “お出かけ” (going out) would miss a product if the name contained the hiragana “おでかけ”.
If a product name contained the middle dot “・”, a concatenated keyword would no longer match.
Synonyms were also an issue. For example:
- “釣り” and “フィッシング” produced very different results.
- “映画館” and “シネマ” produced very different results.

Our primary goal was to improve search accuracy by mitigating this kind of keyword variance.

We considered maintaining a synonym dictionary, but upkeep takes far more time and cost than expected.

What makes synonym dictionaries challenging is that whether two words are synonyms depends on the service. “Doctor” and “hospital” are not synonyms in the general sense, but treating them as synonyms in search can reduce keyword variance.

Because these issues can occur endlessly, we felt that maintaining a synonym dictionary would be extremely demanding.

At first we evaluated the kuromoji dictionary that also works with AWS OpenSearch, but it did not feel precise enough when we applied it to our search use case.

Although we could have used a user-defined dictionary to fine-tune the search, the maintenance cost would have been too high.

So we chose to use a different dictionary instead of kuromoji. We landed on sudachi-dict, which is published on GitHub. This dictionary was not available on AWS OpenSearch at the time of adoption.

https://github.com/WorksApplications/SudachiDict

We also incorporated search via n-grams. This increases the number of potential hits. If you make the n-gram size too small you introduce noise and risk hurting accuracy, but tuning it to the right value lets you handle issues such as the middle dot example above.

That is how we ultimately replaced search with Elasticsearch.

Pitfalls

Here are the main pitfalls we encountered.

1. Building an Elasticsearch cluster on EC2

To build a cluster you need the EC2 instances to communicate with each other. Because Elasticsearch stores data, it is preferable to place the nodes in as private an environment as possible.

Even though the instances were on the same network and should have been able to communicate privately, we kept running into issues. After more research we found that, rather than an AWS networking issue, the official Elasticsearch guidance was to use the EC2 discovery plug-in. Once we installed it the nodes formed a cluster successfully.

https://www.elastic.co/docs/reference/elasticsearch/plugins/discovery-ec2

2. Elasticsearch has no master-slave concept

In Elasticsearch there is no fixed master node. Instead, the nodes forming the cluster vote to elect a master. This means you also need to be thoughtful about how you expose the endpoint.

Because the election requires a quorum, a two-node cluster is not highly available. This is another point to keep in mind.

We had not understood this mechanism beforehand, so we ended up changing the infrastructure design later. It is crucial to research this early.

Results after the rollout

After introducing Elasticsearch we also saw improved search performance.

Average ALB response times, which had consistently exceeded 0.5 seconds, now stay below roughly 0.1 seconds. We believe this improvement comes from reducing the load on the database by switching to Elasticsearch.

ALB average response time

Closing thoughts

This was a look back at our recent Elasticsearch rollout.

Replacing a search engine is not easy, but search accuracy is a critical feature. We will keep aiming to make search easier to use and to help users find what they are looking for.

Share: X (Twitter)