ELK mappings and analysers
--
Some basics to understand ELK, please go through this blog.
Mappings are like a class in OOPS, where you define fields and data types of the fields in the document, and these documents are stored in an index.
Reference: link
Full text fields, fields contain number, dates or geo-locations, format of date, custom rules to control mapping of dynamically added fields.
Metadata field: are used to customise how a document’s associated metadata is treated. Eg: _index, _id, and _source fields.
Fields: A mapping contains a list of fields or properties pertinent to the document. Each field has it’s own datatype.
Common Data types: binary (value encoded as a base64 string), Boolean, keywords(keyword,constant_keyword, wildcard), numbers, dates, alias(for an existing field)
Object and relational types:
Object: A JSON object.
flattened: An entire JSON object as a single field value
nested: A JSON object that preserves the relationship between it’s subfields
join: Defines parent/child relationship
Structured Data types:
Range: long_range, double_range, date_range, ip_range
ip: IPv4, IPv6
murmur3: compute and stores hashes of values.
Aggregate data types:
histogram: pre-aggregated numerical values
Text analyzers:
Reference: link
Character Filters: receives the original text as a stream of characters and can transform the stream by adding, removing or changing characters. Analyzer may have one or more character filters, which are applied in order.
Tokenizer: Receives a stream of characters, breaks it up into individual tokens and outputs a stream of tokens. For eg. white space tokenizer. An analyzer must have only one tokenizer
Token Filters: Token filter receives the token strea and may add, remove or change tokens. for eg. lower case token filter. An analyzer can one or more token filters.
in built analyzers:
excellent article for ngrams as filters and tokenizers.
ngram: tokenizes based on the min and max grams with all the possible combinations, this will help primarily on searching of items. Try this query on kibana console.
POST _analyze{"tokenizer": "ngram","text":"hello world!"}
edge_ngram: primarily used in type-ahead or auto-complete features, where it tokenizes fields from the beginning of the words.
POST _analyze{"tokenizer": "edge_ngram","text":"hello world!"}
Some more references:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html
https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch/