Version: Latest (4.60.0)

Insights

Introduction

The Insights accelerator is a collection of flows that can be used to analyze the data in the Content Store. To use the Insights accelerator, you need to have Elasticsearch installed and running. For more comprehensive information on how to configure Elasticsearch, please refer to the Elasticsearch docs. For a quick installation guide follow Quick Installation Steps of Elasticsearch & Kibana chapter below.

The first flow Insights (1. Analysis) is the flow that performs the analysis on the Content Store and enriches the data with insights. The second flow Insights (2. Ingest) is the flow that ingests the data into Elasticsearch.

note

One of the supported analysis, is the analysis of duplicates. In order to calculate the duplicates, the BINARY documents in the Content Store must contain a hash. If the BINARY documents do not contain a hash, the duplicates analysis will not work as expected. You can use the Calculate hash accelerator to calculate the hash of a BINARY documents.

After the data is ingested into Elasticsearch you can use Kibana to visualize the data. For more comprehensive information on how to install and run Kibana, please refer to the Kibana docs. For a quick installation guide follow Quick Installation Steps of Elasticsearch & Kibana chapter below. As part of this accelerator, a pre-configured set of Kibana dashboards is provided in the same folder where you found this accelerator. These dashboards can be imported into Kibana. For more information on how to import dashboards into Kibana, please refer to the Kibana saved objects.

A wealth of insights can be obtained, including but not limited to:

Insights into the total number and size of the content
Insights into the structure of the content
Insights into versions and translations
Insights into duplicates
Insights quality and completeness of metadata
Insights into the content lifecycle

Quick Installation Steps

1. Download Elasticsearch and Kibana

Elasticsearch download package | Kibana download package

2. Unzip downloaded packages in the Software Repository

3. Run Elasticsearch for the first time

Navigate to the Elasticsearch root directory (e.g., D:\Software\elasticsearch-9.3.2).
In the address bar, type cmd and press Enter.
In the Command Prompt, type bin\elasticsearch and press Enter.

4. Elasticsearch password and Kibana enrollment token

Elasticsearch password will be shown in the Elasticsearch terminal under Password for the elastic user.
Elasticsearch username is by default elastic.
Kibana enrollment token will be shown in the terminal under Copy the following enrollment token and paste it into Kibana....
Store both values in a safe location.

5. Run Kibana for the first time

Navigate to the Kibana root directory (e.g., D:\Software\kibana-9.3.2).
In the address bar, type cmd and press Enter.
In the Command Prompt, type bin\kibana-setup --enrollment-token <your enrollment token> and press Enter twice!
Make sure to change the enrollment-token value accordingly. When successful, in the same terminal type bin\kibana.bat and press Enter.

6. Validate

Go to Elasticsearch Homepage, enter the username and password. Done!

note

Requires Elasticsearch 9.3.2 and Kibana 9.3.2.
It is recommended to run the Elasticsearch and Kibana instances on a separate machine.
After running Elasticsearch for the first time please activate Kibana with the enrollment token within 30 minutes.

Re-running Insights

The first flow can be re-run to re-analyze the data in the Content Store.
The second flow can be re-run to re-ingest the data into Elasticsearch after setting enableReset to true in the flow variables.

note

If you want to re-ingest the data into Elasticsearch, it is recommend to delete the index in Elasticsearch first.
For more information on how to delete an index in Elasticsearch, please refer to the Elasticsearch delete indices.
Or navigate to Stack Management > Index Management in Kibana.

Metadata

This chapter describes Quality analysis, Custom metadata, Elasticsearch data mapping.

Quality analysis

In addition to the pre-configured set of dashboards, Kibana offers a feature called Field Statistics that can be used to analyze metadata quality and completeness. For more information on how to use Field Statistics, please refer to the Kibana documentation.

Custom metadata

By default the Insights accelerator is configured to analyze the data in the Content Store metadata. If you want to analyze custom metadata, you can modify the Insights (2. Ingest) flow to include the custom metadata by editing the two Template Engine components in the flow. Store the custom metadata in the metadata key as an object. For example:

{
  "operation": ...
  "data": {
    ... // Content Store metadata
    "metadata": {
      "customMetadata": "customMetadataValue"
    }
  }
}

note

The Insights accelerator has the following limitations:

Only the primary binary of a record is analyzed
If a document has multiple parents, only the first parent is analyzed

Elasticsearch data mapping

Name	Mandatory	Type	Description	Content Store Path
_id	yes		Unique identifier	_id
id	yes	keyword	Id of object or full path if file system	source versionInfo seriesId
type	yes	keyword	Type of object, e.g. CONTAINER	kind
subType	yes	keyword	SubType to specify a more detailed type, e.g. container	source contentType systemName
is File	yes	boolean	Specifies if an object is/contains a binary	hardcoded
source	yes	keyword	Name of the source system	migration origin
name	yes	text	Name of the object.	source name systemName
name keyword	automatically calculated	keyword
name length	automatically calculated	token_count
name*	no	keyword	Fields with the prefix "name" will be automatically indexed
description	no	text	Description of the object, e.g. title field	source description
state	yes	keyword	One or more state values, e.g. hidden	source states
hierarchy	yes	text	Full and unique hierarchy to the object	source hierarchies
hierarchy keyword	automatically calculated	keyword
hierarchy length	automatically calculated	token_count
hierarchy depth	automatically calculated	token_count
hierarchy tree	automatically calculated	text	Can be used for special search use cases
hierarchy treeReversed	automatically calculated	text	Can be used for special search use cases
url	no	keyword	Contains the full web url in case of an ECM system
parent	yes	keyword	Full parent hierarchy	Parent path of source hierarchies
parent tree	automatically calculated	text	Can be used for aggregation on the structure
parent Id	no	keyword	Unique id of the parent object	source parentIds
date Created	yes	date	Creation date of the object	source created date
date Modified	yes	date	Last modified date of the object	source lastModified date
date Accessed	no	date	Last accessed date of the object	source lastAccessed date
date*	no	date	Fields with the prefix "date" will be automatically indexed
principal Created	no	text	Principal that created the object, e.g. group
principal Created keyword	automatically calculated	keyword
principal Modified	no	text	Principal that last modified the object, e.g. user
principal Modified keyword	automatically calculated	keyword
principal Accessed	no	text	Principal that last accessed the object, e.g. user
principal Accessed keyword	automatically calculated	keyword
principal Author	no	text	The author of the object
principal Author keyword	automatically calculated	keyword
principal Owner	no	text	The owner of the object
principal Owner keyword	automatically calculated	keyword
principal*	no	text	Fields with the prefix "principal" will be automatically
binary Extension	yes (for files)	keyword	Extension of the binary, if empty then null	source binaries source rawExtension
binary Extension normal	automatically calculated	keyword	Normalized version of the value
binary Extension length	automatically calculated	token_count
binary Byte Size	yes (for files)	long	Size in bytes of the binary	source binaries source byteSize
binary Hash	no	keyword	Hash of the binary	source binaries source hash
reversed Version Order	yes	integer	A number specifying the order of versions in a reversed manner	reversedVersionOrder
version Count	yes	integer	Number of versions for the object, including the current version	versionCount
language	yes	keyword	Specifies the language of an object	source language systemName
is Original Language	yes	boolean	Specifies if the object is in the original language	source language masterReference
analytics Binary Hash Count	automatically calculated	integer	Calculated field containing the amount of times a hash exists	source binaries source properties insights hashCount
analytics Binary Unique	automatically calculated	boolean	Calculated field set to true for only one of the objects per hash	source binaries source properties insights binaryUnique
analytics Binary ParentUnique	automatically calculated	boolean	Calculated field set to true for only one of the objects per hash sharing the same parent	source binaries source properties insights binaryParentUnique
analytics Has Children	automatically calculated	boolean	Calculated field set to true if the object has child objects	source properties insights hasChildren
analytics Classification.*		keyword	The results of the classification process based on the binaryExtension, grouped by binaryFileSize	source binaries source properties insights analyticsClassification
analytics Translation Count	no	integer	Calculated field containing the amount of times an object is translated
analytics Available Translations	no	keyword	Calculated field containing all available translations for an object
purview Applied Label	no	keyword	The label applied by Purview	source properties purview appliedLabel
purview Information Types	no	keyword	The information types found by Purview	source properties purview informationTypes
migrate	yes	boolean	Whether the document will be migrated	migration migrate
migration Id	yes	keyword	The Id in the target system after migration	migration id
migration Failed	yes	boolean	Indicates if the migration failed for this document	migration failed
migration Failed Message	yes	text	Indicates the reason for a failed migration	migration failedMessage
metadata	no	object	Object field to store any additional metadata

note

When storing any additional fields in the metadata object, the type of the first value decides the type for this field. Changes in the field type will cause validation errors. For example:

The first document has a metadata.registrationDate field with the value "20th September 2021". The type for this field will be keyword as it is a string.
Later on we have a document with the field metadata.registrationDate, but in this case the value is "2021-09-206T21:50:28.342Z". The type for this field will be date. This will cause a validation error.

The documents are by default ingested in batches of 1000 documents, if one of the documents fails, due to validation error as mentioned above, the whole batch won't be imported.

For more information on different field data types in Elasticsearch, please refer to the Elasticsearch field data types

Flows

This chapter describes configuration of the flows Insights (1. Analysis) and Insights (2. Ingest).

Insights (1. Analysis) settings

Performs the analysis on the Content Store and enriches the data with insights

mongoConnection

The Mongo connection string including the database name to connect to.

Insights (2. Ingest) settings

Ingests the data into Elasticsearch.

You can easily change the data that is Ingested by modifying the Document Retrieve component in the flow.

The default query is:

{
  "kind": {
    "$in": ["CONTAINER", "RECORD"]
  },
  "source.versionInfo.isCurrent": true
}

You can extend this query to include additional filters. For example:

{
  "kind": {
    "$in": ["CONTAINER", "RECORD"]
  },
  "source.versionInfo.isCurrent": true,
  "source.contentType.systemName": "myContentType"
}

note

The isCurrent filter is required as the checkbox Include source versions is checked.

elasticsearchConnection

Elasticsearch connection string.

Example: http://localhost:9200

elasticsearchUsername

Elasticsearch username.

elasticsearchPassword

Elasticsearch password.

elasticsearchCertificatePath

Path to the Elasticsearch certificate.

Example: C:\certificates\elasticsearch\elasticsearch.crt

mongoConnection

The Mongo connection string including the database name to connect to.

indexName

The name of the index to use in Elasticsearch.

enableReset

A flag is set for each CONTAINER and RECORD document that has been ingested in ElasticSearch, which allows the flow to be resumable. When this option's value is set to true, that flag is reset. By default, the value is set to false.

Introduction​

Quick Installation Steps​

1. Download Elasticsearch and Kibana​

2. Unzip downloaded packages in the Software Repository​

3. Run Elasticsearch for the first time​

4. Elasticsearch password and Kibana enrollment token​

5. Run Kibana for the first time​

6. Validate​

Re-running Insights​

Metadata​

Quality analysis​

Custom metadata​

Elasticsearch data mapping​

Flows​

Insights (1. Analysis) settings​

mongoConnection​

Insights (2. Ingest) settings​

elasticsearchConnection​

elasticsearchUsername​

elasticsearchPassword​

elasticsearchCertificatePath​

mongoConnection​

indexName​

enableReset​

Introduction

Quick Installation Steps

1. Download Elasticsearch and Kibana

2. Unzip downloaded packages in the Software Repository

3. Run Elasticsearch for the first time

4. Elasticsearch password and Kibana enrollment token

5. Run Kibana for the first time

6. Validate

Re-running Insights

Metadata

Quality analysis

Custom metadata

Elasticsearch data mapping

Flows

Insights (1. Analysis) settings

mongoConnection

Insights (2. Ingest) settings

elasticsearchConnection

elasticsearchUsername

elasticsearchPassword

elasticsearchCertificatePath

mongoConnection

indexName

enableReset