Version: 4.52.0

Insights

The Insights accelerator is a collection of flows that can be used to analyze the data in the Content Store. To use the Insights accelerator, you need to have Elasticsearch installed and running. For more information on how to install and run Elasticsearch, please refer to the Elasticsearch documentation.

note

Requires Elasticsearch 8.8.1 or and it is recommended to run the Elasticsearch instance on a separate machine.

The first flow Insights (1. Analysis) is the flow that performs the analysis on the Content Store and enriches the data with insights. The second flow Insights (2. Ingest) is the flow that ingests the data into Elasticsearch.

note

One of the supported analysis, is the analysis of duplicates. In order to calculate the duplicates, the BINARY documents in the Content Store must contain a hash. If the BINARY documents do not contain a hash, the duplicates analysis will not work as expected. You can use the Calculate hash accelerator to calculate the hash of a BINARY documents.

After the data is ingested into Elasticsearch you can use Kibana to visualize the data. For more information on how to install and run Kibana, please refer to the Kibana documentation. As part of this accelerator, a pre-configured set of Kibana dashboards is provided in the same folder where you found this accelerator. These dashboards can be imported into Kibana. For more information on how to import dashboards into Kibana, please refer to the Kibana documentation.

A wealth of insights can be obtained, including but not limited to:

Insights into the total number and size of the content
Insights into the structure of the content
Insights into versions and translations
Insights into duplicates
Insights quality and completeness of metadata
Insights into the content lifecycle

Re-running Insights

The first flow can be re-run to re-analyze the data in the Content Store. The second flow can be re-run to re-ingest the data into Elasticsearch. If you want to re-ingest the data into Elasticsearch, it is recommend to delete the index in Elasticsearch first. For more information on how to delete an index in Elasticsearch, please refer to the Elasticsearch documentation. Or use Stack Management > Index Management in Kibana.

Metadata

Quality analysis

In addition to the pre-configured set of dashboards, Kibana offers a feature called Field Statistics that can be used to analyze metadata quality and completeness. For more information on how to use Field Statistics, please refer to the Kibana documentation.

Custom metadata

By default the Insights accelerator is configured to analyze the data in the Content Store metadata. If you want to analyze custom metadata, you can modify the Insights (2. Ingest) flow to include the custom metadata by editing the two Template Engine components in the flow. Store the custom metadata in the metadata key as an object. For example:

{
  "operation": ...
  "data": {
    ... // Content Store metadata
    "metadata": {
      "customMetadata": "customMetadataValue"
    }
  }
}

note

The Insights accelerator has the following limitations:

Only the primary binary of a record is analyzed
If a document has multiple parents, only the first parent is analyzed

Elasticsearch data mapping

name	mandatory	type	description	default Content Store mapping
`_id`	yes		Unique identifier	`_id`
`id`	yes	keyword	Id of object or full path if file system (same Id for different versions and languages)	`source.versionInfo.seriesId`
`type`	yes	keyword	Type of object, container for objects that can contain children.	`kind`
`subType`	yes	keyword	SubType to specify a more detailed type, for example folder for container	`source.contentType.systemName`
`isFile`	yes	boolean	Specifies if an object is/contains a binary	hardcoded
`source`	yes	keyword	Name of the source system	`migration.origin`
`name`	yes	text	Name of the object, think name field or shortName field	`source.name.systemName`
`name.keyword`	automatically calculated	keyword
`name.length`	automatically calculated	token_count
`name*`	no	keyword	Additional fields that start with the prefix "name" will be automatically indexed as keywords to perform search and aggregations on
`description`	no	text	Description of the object, think description field, longName field or title field	`source.description`
`state`	yes	keyword	One or more state values, think hidden, archived, deleted, etc. In none provide an empty array	`source.states`
`hierarchy`	yes	text	Full and unique hierarchy to the object (using forward slashes) including the object, the path may consists of the full hierarchy of id's to the object separated by slashes or a file system path (different versions or language copies should be reflected in the hierarchy)	`source.hierarchies.[0]`
`hierarchy.keyword`	automatically calculated	keyword
`hierarchy.length`	automatically calculated	token_count
`hierarchy.depth`	automatically calculated	token_count
`hierarchy.tree`	automatically calculated	text	Can be used for special search use cases, please refer to the path hierarchy tokenizer documentation of elasticsearch
`hierarchy.treeReversed`	automatically calculated	text	Can be used for special search use cases, please refer to the path hierarchy tokenizer documentation of elasticsearch
`url`	no	keyword	Contains the full web url in case of an ecm system.
`parent`	yes	keyword	Full parent hierarchy	Parent path of `source.hierarchies.[0]`
`parent.tree`	automatically calculated	text	Can be used for aggregation on the structure
`parentId`	no	keyword	Unique id of the parent object	`source.parentIds.[0]`
`dateCreated`	yes	date	Creation date of the object	`source.created.date`
`dateModified`	yes	date	Last modified date of the object, if not available set to dateCreated	`source.lastModified.date`
`dateAccessed`	no	date	Last accesses date of the object, if not available set to dateModified	`source.lastAccessed.date`
`date*`	no	date	Additional fields that start with the prefix "date" will be automatically indexed as dates to perform search and aggregations on
`principalCreated`	no	text	Principal that created the object, think user or group
`principalCreated.keyword`	automatically calculated	keyword
`principalModified`	no	text	Principal that last modified the object, think user or group
`principalModified.keyword`	automatically calculated	keyword
`principalAccessed`	no	text	Principal that last accessed the object, think user or group
`principalAccessed.keyword`	automatically calculated	keyword
`principalAuthor`	no	text	Principal that is the author of the object, think user or group
`principalAuthor.keyword`	automatically calculated	keyword
`principalOwner`	no	text	Principal that is the owner of the object, think user or group
`principalOwner.keyword`	automatically calculated	keyword
`principal*`	no	text	Additional fields that start with the prefix "principal" will be automatically indexed as keywords to perform search and aggregations on
`binaryExtension`	yes (for files)	keyword	Extension of the binary, if empty then null	`source.binaries.[0].source.rawExtension`
`binaryExtension.normal`	automatically calculated	keyword	Normalized version of the value
`binaryExtension.length`	automatically calculated	token_count
`binaryByteSize`	yes (for files)	long	Size in bytes of the binary	`source.binaries.[0].source.byteSize`
`binaryHash`	no	keyword	Hash of the binary	`source.binaries.[0].source.hash`
`reversedVersionOrder`	yes	integer	A number specifying the order of versions in a reversion manner	`reversedVersionOrder`
`versionCount`	yes	integer	Number of versions for the object, including the current version	`versionCount`
`language`	yes	keyword	Specifies the language of an object, if not available set to undefined	`source.language.systemName`
`isOriginalLanguage`	yes	boolean	Specifies if the object is in the original language (if not it's an translation), if no language is available set to true	`source.language.masterReference`
`metadata`	no	object	Object field to store any additional metadata.
`analyticsBinaryHashCount`	automatically calculated	integer	Calculated field containing the amount of times a hash exists	`source.binaries.[0].source.properties.insights.hashCount`
`analyticsBinaryUnique`	automatically calculated	boolean	Calculated field set to true for only one of the objects per hash, otherwise false	`source.binaries.[0].source.properties.insights.binaryUnique`
`analyticsBinaryParentUnique`	automatically calculated	boolean	Calculated field set to true for only one of the objects per hash sharing the same parent, otherwise false	`source.binaries.[0].source.properties.insights.binaryParentUnique`
`analyticsHasChildren`	automatically calculated	boolean	Calculated field set to true if the object has child objects, otherwise false	`source.properties.insights.hasChildren`
`analyticsClassification.*`		keyword	Object that contains the results of the classification process, currently the process categorizes the object bases on the binary and extension and it groups based on the binaryFileSize	`source.binaries.[0].source.properties.insights.analyticsClassification`
`analyticsTranslationCount`	no	integer	Calculated field containing the amount of times an object is translated.
`analyticsAvailableTranslations`	no	keyword	Calculated field containing all available translation for an object
`purviewAppliedLabel`	no	keyword	The label applied by Purview	`source.properties.purview.appliedLabel`
`purviewInformationTypes`	no	keyword	The information types found by Purview	`source.properties.purview.informationTypes`
`migrate`	yes	boolean	Whether the document will be migrated	`migration.migrate`
`migrationId`	yes	keyword	The Id in the target system after migration	`migration.id`
`migrationFailed`	yes	boolean	Indicates of the migration failed for this document	`migration.failed`
`migrationFailedMessage`	yes	text	Indicates the reason of a failed migration	`migration.failedMessage`

Flows

Insights (1. Analysis) settings

Performs the analysis on the Content Store and enriches the data with insights

mongoConnection

The Mongo connection string including the database name to connect to.

Insights (2. Ingest) settings

Ingests the data into Elasticsearch.

You can easily change the data that is Ingested by modifying the Document Retrieve component in the flow.

The default query is:

{
  "kind": {
    "$in": ["CONTAINER", "RECORD"]
  },
  "source.versionInfo.isCurrent": true
}

You can extend this query to include additional filters. For example:

{
  "kind": {
    "$in": ["CONTAINER", "RECORD"]
  },
  "source.versionInfo.isCurrent": true,
  "source.contentType.systemName": "myContentType"
}

note

The isCurrent filter is required as the checkbox Include source versions is checked.

elasticsearchConnection

Elasticsearch connection string

Example: http://localhost:9200

elasticsearchUsername

Elasticsearch username

elasticsearchPassword

Elasticsearch password

elasticsearchCertificatePath

Path to the Elasticsearch certificate

Example: C:\certificates\elasticsearch\elasticsearch.crt

mongoConnection

The Mongo connection string including the database name to connect to.

indexName

The name of the index to use in Elasticsearch z

Re-running Insights

Metadata

Quality analysis​

Custom metadata​

Elasticsearch data mapping​

Flows

Insights (1. Analysis) settings​

mongoConnection​

Insights (2. Ingest) settings

elasticsearchConnection​

elasticsearchUsername​

elasticsearchPassword​

elasticsearchCertificatePath​

mongoConnection​

indexName​

Quality analysis

Custom metadata

Elasticsearch data mapping

Insights (1. Analysis) settings

mongoConnection

elasticsearchConnection

elasticsearchUsername

elasticsearchPassword

elasticsearchCertificatePath

mongoConnection

indexName