Indexing

Documents can be added to the Elasticsearch index single or in bulk.

For a larger number of documents, the bulk operation is recommended for efficient processing within Elasticsearch.

Adding single documents

A document, which represented by a Map of field names and values, can be added to the Elasticsearch index as follows:

Map<String, Object> doc = ...
singleIndexElasticsearchService.addToIndex(DataChangeProcessingMode.BACKGROUND, doc);

The DocumentBuilder can be used to create the documents in an easy and fluent way.

The values are added to the index based on the FieldConfiguration. If no FieldConfiguration exists for a field, the data type is determined by Elasticsearch.

Bulk indexing

Multiple documents can be added as a List to the Elasticsearch index as follows:

List<Map<String, Object>> docs = ...
singleIndexElasticsearchService.addToIndex(DataChangeProcessingMode.BACKGROUND, docs);

With bulk indexing, documents are added to the Elasticsearch index internally in chunks (default chunk size: 1000 documents).

The default indexing bulk size can be overridden in elasticsearch.properties:

elasticsearch.service.indexing_bulk_size=1000

DataChangeProcessingMode

There are two processing modes for adding documents to the index:

Background - Use this mode as far as possible!

DataChangeProcessingMode.BACKGROUND

This mode allows the application to handle data changes asynchronous in background. When the operation is finished the data might not have been changed down to the search index immediately. The changes might apply immediately, but in general they will apply some seconds later.

Processing data changes in the background or by keeping caches not in sync reduces the load of an application significantly.

Please see the Elasticsearch documentation for further information.

Blocking - Be careful using this mode!

DataChangeProcessingMode.BLOCKING

When the operation is finished, all the changes are available in the search index.

Following searches will respect the data changes immediately. Using this mode can trigger additional load and may slow down the current use case and other use cases.

Changing too much data in this mode, or using this mode in too many parallel executions might trigger very heavy additional load, causing the application to pause or block.

DocumentBuilder

Documents can be easily added to the index using the DocumentBuilder.

The following example is based on a simple FieldConfiguration of three fields:

@Bean
List<FieldConfiguration> fieldConfigurations() {
    return Arrays.asList(
            FieldConfiguration.ID_FIELD,
            FieldConfiguration.FULLTEXT_FIELD,
            StandardFieldConfiguration.builder("title", ElasticsearchType.TEXT)
                .copyToFulltext(true).sortable(true).build()
    );
}

The field definition above results in the following Elasticsearch mapping:

{
  "mapping": {
    "_doc": {
      "properties": {
        "fulltext": {
          "type": "text"
        },
        "id": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "title": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          },
          "copy_to": [
            "fulltext"
          ]
        }
      }
    }
  }
}

Adding three documents with fields id and title to the index:

singleIndexElasticsearchService.addToIndex(DataChangeProcessingMode.BACKGROUND, 
    Arrays.asList(
        DocumentBuilder.id(1).put("title", "This is a test title").build(),
        DocumentBuilder.id(2).put("title", "This is another test title").build(),
        DocumentBuilder.id(3).put("title", "This is one more test title").build()
));

This leads to the following Elasticsearch index:

{
  "docs": [
    {
      "_index": "picturesafe-search-sample-20200324-183945-762",
      "_type": "_doc",
      "_id": "1",
      "_version": 1,
      "_seq_no": 0,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "id": "1",
        "title": "This is a test title"
      }
    },
    {
      "_index": "picturesafe-search-sample-20200324-183945-762",
      "_type": "_doc",
      "_id": "2",
      "_version": 1,
      "_seq_no": 1,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "id": "2",
        "title": "This is another test title"
      }
    },
    {
      "_index": "picturesafe-search-sample-20200324-183945-762",
      "_type": "_doc",
      "_id": "3",
      "_version": 1,
      "_seq_no": 2,
      "_primary_term": 1,
      "found": true,
      "_source": {
        "id": "3",
        "title": "This is one more test title"
      }
    }
  ]
}