Index initialization

Overview

picturesafe-search-enterprise provides methods to initialize or rebuild indices asynchronously with one single call.

Index initialization

If you want to create an Elasticsearch index based on given data, you may use one of the index initialization methods of the EnterpriseElasticsearchService. The index initialization is processed asynchronously, so your code does not have to wait for it to complete. The data ingested to the index is provided by a given DocumentProvider. If also you provide a IndexInitializationListener, it will be notified on the progress.

There are two different ways of defining a DocumentProvider:

Static DocumentProvider defined as a bean in the spring context (will be autowired)

@Component
public class MyDocumentProvider implements DocumentProvider  {

    @Override
    public void loadDocuments(String indexAlias, DocumentHandler handler) {
        ...
    }
}
elasticsearchService.createAndInitializeIndex("my-index", false, 
        listener, DataChangeProcessingMode.BACKGROUND);

Dynamic DocumentProvider as a parameter to the index initialization method

elasticsearchService.createAndInitializeIndex("my-index", myDocumentProvider, false, 
        listener, DataChangeProcessingMode.BACKGROUND);

Index rebuild

If you have an existing Elasticsearch index, and want to sync it with your data source or you want to change the existing field definition, you may use the rebuildIfExists parameter of the index initialization methods in EnterpriseElasticsearchService. The index rebuild is also processed asynchronously, and a IndexInitializationListener will be notified on the progress.

The existing index will still be searchable via its alias while the rebuild is in progress. Any update calls (insert, update or delete of documents) in the meantime will be queued and processed afterwards, so no changes will be lost.

elasticsearchService.createAndInitializeIndex("my-index", true, 
        listener, DataChangeProcessingMode.BACKGROUND);

DocumentProvider

The DocumentProvider interface may be implemented to have your data ingested in the Elasticsearch index. Its loadDocuments method will be called asynchronously and has to provide the data in chunks to the given call-back DocumentHandler.

@Component
public class MyDocumentProvider implements DocumentProvider  {

    @Override
    public void loadDocuments(String indexAlias, DocumentHandler handler) {
        while(hasDataToLoad()) {
            List<Map<String, Object>> chunk = loadChunk();
            handler.handleDocuments(
                new DocumentProvider.DocumentChunk(chunk, processedCount, totalCount));        
        }
    }
}

Note: If your chunk size is too big, the REST request to Elasticsearch may get too large. On the other hand, if your chunk size is too small, the overall performance of the index initialization or rebuild may be bad. A value between 100 and 1000 should be a good value to try.

For special purposes, there are some predefined DocumentProvider implementations:

CsvDocumentProvider

Loads data from a CSV file.

Note: The first line of the CSV file will be considered as the field names.

SiteCrawlerDocumentProvider

Loads data from web pages. It will read the HTML data of a given URL and crawl the contained links which refer to the same domain.

TestDataDocumentProvider

Creates test data matching your index settings.

IndexInitializationListener

The IndexInitializationListener interface is a call-back for index initialization or rebuild process. The initialization process will notify the listener on the initialization step, which is currently performed, and the number of documents, which have been processed, and the total number of documents.

public class MyInitializationListener implements IndexInitializationListener {

    @Override
    public void updateProgress(Event event) {
        logProgress(event.getType(), event.getDocumentsProcessed(), event.getTotalDocuments());
        
        if (event.getType() == IndexInitializationListener.Event.Type.END) {
            updateGuiState();
        }
    }
}

IndexSettings

The IndexSettings bundles the IndexPresetConfiguration and the FieldConfiguration list. If you want to provide these settings dynamically to the index initialization or rebuild call, you can pass an IndexSettings instance as a method parameter.

The provided IndexSettings will be persisted automatically by picturesafe-search-enterprise, so search requests can rely on it. In this case you do not need to define the IndexPresetConfiguration and the FieldConfiguration list as spring beans.

elasticsearchService.createAndInitializeIndex(
        new IndexSettings(indexPresetConfiguration, fieldConfigurations), 
        myDocumentProvider, false, listener, DataChangeProcessingMode.BACKGROUND);

Please see the index creation and field configuration documentations for more details.