Skip to main content
Version: Latest (4.54.0)

Insights

Introduction

The Insights accelerator is a collection of flows that can be used to analyze the data in the Content Store. To use the Insights accelerator, you need to have Elasticsearch installed and running. For more comprehensive information on how to configure Elasticsearch, please refer to the Elasticsearch guide. For a quick installation guide follow Quick Installation Steps of Elasticsearch & Kibana chapter below.

The first flow Insights (1. Analysis) is the flow that performs the analysis on the Content Store and enriches the data with insights. The second flow Insights (2. Ingest) is the flow that ingests the data into Elasticsearch.

note

One of the supported analysis, is the analysis of duplicates. In order to calculate the duplicates, the BINARY documents in the Content Store must contain a hash. If the BINARY documents do not contain a hash, the duplicates analysis will not work as expected. You can use the Calculate hash accelerator to calculate the hash of a BINARY documents.

After the data is ingested into Elasticsearch you can use Kibana to visualize the data. For more comprehensive information on how to install and run Kibana, please refer to the Kibana guide. For a quick installation guide follow Quick Installation Steps of Elasticsearch & Kibana chapter below. As part of this accelerator, a pre-configured set of Kibana dashboards is provided in the same folder where you found this accelerator. These dashboards can be imported into Kibana. For more information on how to import dashboards into Kibana, please refer to the Kibana managing saved objects.

A wealth of insights can be obtained, including but not limited to:

  • Insights into the total number and size of the content
  • Insights into the structure of the content
  • Insights into versions and translations
  • Insights into duplicates
  • Insights quality and completeness of metadata
  • Insights into the content lifecycle

Quick Installation Steps

1. Download Elasticsearch and Kibana

2. Unzip downloaded packages in the Software Repository

3. Run Elasticsearch for the first time

  • Navigate to the Elasticsearch root directory (e.g., D:\Software\elasticsearch-8.8.1-windows-x86_64\elasticsearch-8.8.1).
  • In the address bar, type cmd and press Enter.
  • In the Command Prompt, type bin\elasticsearch and press Enter.

4. Elasticsearch password and Kibana enrollment token

  • Elasticsearch password will be shown in the Elasticsearch terminal under Password for the elastic user.
  • Elasticsearch username is by default elastic.
  • Kibana enrollment token will be shown in the terminal under Copy the following enrollment token and paste it into Kibana....
  • Store both values in a safe location.

5. Run Kibana for the first time

  • Navigate to the Kibana root directory (e.g., D:\Software\kibana-8.8.1-windows-x86_64\kibana-8.8.1).
  • In the address bar, type cmd and press Enter.
  • In the Command Prompt, type bin\kibana-setup --enrollment-token <your enrollment token> and press Enter twice!
  • Make sure to change the enrollment-token value accordingly. When successful, in the same terminal type bin\kibana.bat and press Enter.

6. Validate

note
  • Requires Elasticsearch 8.8.1 and Kibana 8.8.1.
  • It is recommended to run the Elasticsearch and Kibana instances on a separate machine.
  • After running Elasticsearch for the first time please activate Kibana with the enrollment token within 30 minutes.

Re-running Insights

  • The first flow can be re-run to re-analyze the data in the Content Store.
  • The second flow can be re-run to re-ingest the data into Elasticsearch.
note
  • If you want to re-ingest the data into Elasticsearch, it is recommend to delete the index in Elasticsearch first.
  • For more information on how to delete an index in Elasticsearch, please refer to the Elasticsearch indices delete index.
  • Or navigate to Stack Management > Index Management in Kibana.

Metadata

This chapter describes Quality analysis, Custom metadata, Elasticsearch data mapping.

Quality analysis

In addition to the pre-configured set of dashboards, Kibana offers a feature called Field Statistics that can be used to analyze metadata quality and completeness. For more information on how to use Field Statistics, please refer to the Kibana documentation.

Custom metadata

By default the Insights accelerator is configured to analyze the data in the Content Store metadata. If you want to analyze custom metadata, you can modify the Insights (2. Ingest) flow to include the custom metadata by editing the two Template Engine components in the flow. Store the custom metadata in the metadata key as an object. For example:

{
"operation": ...
"data": {
... // Content Store metadata
"metadata": {
"customMetadata": "customMetadataValue"
}
}
}
note

The Insights accelerator has the following limitations:

  • Only the primary binary of a record is analyzed
  • If a document has multiple parents, only the first parent is analyzed

Elasticsearch data mapping

NameMandatoryTypeDescriptionContent Store Path
_idyesUnique identifier_id
idyeskeywordId of object or full path if file systemsource versionInfo seriesId
typeyeskeywordType of object, e.g. CONTAINERkind
subTypeyeskeywordSubType to specify a more detailed type, e.g. containersource contentType systemName
is FileyesbooleanSpecifies if an object is/contains a binaryhardcoded
sourceyeskeywordName of the source systemmigration origin
nameyestextName of the object.source name systemName
name keywordautomatically calculatedkeyword
name lengthautomatically calculatedtoken_count
name*nokeywordFields with the prefix "name" will be automatically indexed
descriptionnotextDescription of the object, e.g. title fieldsource description
stateyeskeywordOne or more state values, e.g. hiddensource states
hierarchyyestextFull and unique hierarchy to the objectsource hierarchies
hierarchy keywordautomatically calculatedkeyword
hierarchy lengthautomatically calculatedtoken_count
hierarchy depthautomatically calculatedtoken_count
hierarchy treeautomatically calculatedtextCan be used for special search use cases
hierarchy treeReversedautomatically calculatedtextCan be used for special search use cases
urlnokeywordContains the full web url in case of an ECM system
parentyeskeywordFull parent hierarchyParent path of source hierarchies
parent treeautomatically calculatedtextCan be used for aggregation on the structure
parent IdnokeywordUnique id of the parent objectsource parentIds
date CreatedyesdateCreation date of the objectsource created date
date ModifiedyesdateLast modified date of the objectsource lastModified date
date AccessednodateLast accessed date of the objectsource lastAccessed date
date*nodateFields with the prefix "date" will be automatically indexed
principal CreatednotextPrincipal that created the object, e.g. group
principal Created keywordautomatically calculatedkeyword
principal ModifiednotextPrincipal that last modified the object, e.g. user
principal Modified keywordautomatically calculatedkeyword
principal AccessednotextPrincipal that last accessed the object, e.g. user
principal Accessed keywordautomatically calculatedkeyword
principal AuthornotextThe author of the object
principal Author keywordautomatically calculatedkeyword
principal OwnernotextThe owner of the object
principal Owner keywordautomatically calculatedkeyword
principal*notextFields with the prefix "principal" will be automatically
binary Extensionyes (for files)keywordExtension of the binary, if empty then nullsource binaries source rawExtension
binary Extension normalautomatically calculatedkeywordNormalized version of the value
binary Extension lengthautomatically calculatedtoken_count
binary Byte Sizeyes (for files)longSize in bytes of the binarysource binaries source byteSize
binary HashnokeywordHash of the binarysource binaries source hash
reversed Version OrderyesintegerA number specifying the order of versions in a reversed mannerreversedVersionOrder
version CountyesintegerNumber of versions for the object, including the current versionversionCount
languageyeskeywordSpecifies the language of an objectsource language systemName
is Original LanguageyesbooleanSpecifies if the object is in the original languagesource language masterReference
analytics Binary Hash Countautomatically calculatedintegerCalculated field containing the amount of times a hash existssource binaries source properties insights hashCount
analytics Binary Uniqueautomatically calculatedbooleanCalculated field set to true for only one of the objects per hashsource binaries source properties insights binaryUnique
analytics Binary ParentUniqueautomatically calculatedbooleanCalculated field set to true for only one of the objects per hash sharing the same parentsource binaries source properties insights binaryParentUnique
analytics Has Childrenautomatically calculatedbooleanCalculated field set to true if the object has child objectssource properties insights hasChildren
analytics Classification.*keywordThe results of the classification process based on the binaryExtension, grouped by binaryFileSizesource binaries source properties insights analyticsClassification
analytics Translation CountnointegerCalculated field containing the amount of times an object is translated
analytics Available TranslationsnokeywordCalculated field containing all available translations for an object
purview Applied LabelnokeywordThe label applied by Purviewsource properties purview appliedLabel
purview Information TypesnokeywordThe information types found by Purviewsource properties purview informationTypes
migrateyesbooleanWhether the document will be migratedmigration migrate
migration IdyeskeywordThe Id in the target system after migrationmigration id
migration FailedyesbooleanIndicates if the migration failed for this documentmigration failed
migration Failed MessageyestextIndicates the reason for a failed migrationmigration failedMessage
metadatanoobjectObject field to store any additional metadata
note

When storing any additional fields in the metadata object, the type of the first value decides the type for this field. Changes in the field type will cause validation errors. For example:

  • The first document has a metadata.registrationDate field with the value "20th September 2021". The type for this field will be keyword as it is a string.
  • Later on we have a document with the field metadata.registrationDate, but in this case the value is "2021-09-206T21:50:28.342Z". The type for this field will be date. This will cause a validation error.

The documents are by default ingested in batches of 1000 documents, if one of the documents fails, due to validation error as mentioned above, the whole batch won't be imported.

For more information on different field data types in Elasticsearch, please refer to the Elasticsearch field data types


Flows

This chapter describes configuration of the flows Insights (1. Analysis) and Insights (2. Ingest).

Insights (1. Analysis) settings

Performs the analysis on the Content Store and enriches the data with insights

mongoConnection

The Mongo connection string including the database name to connect to.

Insights (2. Ingest) settings

Ingests the data into Elasticsearch.

You can easily change the data that is Ingested by modifying the Document Retrieve component in the flow.

The default query is:

{
"kind": {
"$in": ["CONTAINER", "RECORD"]
},
"source.versionInfo.isCurrent": true
}

You can extend this query to include additional filters. For example:

{
"kind": {
"$in": ["CONTAINER", "RECORD"]
},
"source.versionInfo.isCurrent": true,
"source.contentType.systemName": "myContentType"
}
note

The isCurrent filter is required as the checkbox Include source versions is checked.

elasticsearchConnection

Elasticsearch connection string

Example: http://localhost:9200

elasticsearchUsername

Elasticsearch username

elasticsearchPassword

Elasticsearch password

elasticsearchCertificatePath

Path to the Elasticsearch certificate

Example: C:\certificates\elasticsearch\elasticsearch.crt

mongoConnection

The Mongo connection string including the database name to connect to.

indexName

The name of the index to use in Elasticsearch