ELK mappings and analysers

Preetham Umarani
2 min readSep 29, 2021

--

Some basics to understand ELK, please go through this blog.

Mappings are like a class in OOPS, where you define fields and data types of the fields in the document, and these documents are stored in an index.

Reference: link

Full text fields, fields contain number, dates or geo-locations, format of date, custom rules to control mapping of dynamically added fields.

Metadata field: are used to customise how a document’s associated metadata is treated. Eg: _index, _id, and _source fields.

Fields: A mapping contains a list of fields or properties pertinent to the document. Each field has it’s own datatype.

Common Data types: binary (value encoded as a base64 string), Boolean, keywords(keyword,constant_keyword, wildcard), numbers, dates, alias(for an existing field)

Object and relational types:

Object: A JSON object.

flattened: An entire JSON object as a single field value

nested: A JSON object that preserves the relationship between it’s subfields

join: Defines parent/child relationship

Structured Data types:

Range: long_range, double_range, date_range, ip_range

ip: IPv4, IPv6

murmur3: compute and stores hashes of values.

Aggregate data types:

histogram: pre-aggregated numerical values

Text analyzers:

Reference: link

Character Filters: receives the original text as a stream of characters and can transform the stream by adding, removing or changing characters. Analyzer may have one or more character filters, which are applied in order.

Tokenizer: Receives a stream of characters, breaks it up into individual tokens and outputs a stream of tokens. For eg. white space tokenizer. An analyzer must have only one tokenizer

Token Filters: Token filter receives the token strea and may add, remove or change tokens. for eg. lower case token filter. An analyzer can one or more token filters.

in built analyzers:

excellent article for ngrams as filters and tokenizers.

ngram: tokenizes based on the min and max grams with all the possible combinations, this will help primarily on searching of items. Try this query on kibana console.

POST _analyze{"tokenizer": "ngram","text":"hello world!"}

edge_ngram: primarily used in type-ahead or auto-complete features, where it tokenizes fields from the beginning of the words.

POST _analyze{"tokenizer": "edge_ngram","text":"hello world!"}

Some more references:

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html

https://medium.com/elasticsearch/introduction-to-analysis-and-analyzers-in-elasticsearch-4cf24d49ddab

https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch/

--

--

Preetham Umarani