Skip to main content
Version: 4.52.0

Insights

The Insights accelerator is a collection of flows that can be used to analyze the data in the Content Store. To use the Insights accelerator, you need to have Elasticsearch installed and running. For more information on how to install and run Elasticsearch, please refer to the Elasticsearch documentation.

note

Requires Elasticsearch 8.8.1 or and it is recommended to run the Elasticsearch instance on a separate machine.

The first flow Insights (1. Analysis) is the flow that performs the analysis on the Content Store and enriches the data with insights. The second flow Insights (2. Ingest) is the flow that ingests the data into Elasticsearch.

note

One of the supported analysis, is the analysis of duplicates. In order to calculate the duplicates, the BINARY documents in the Content Store must contain a hash. If the BINARY documents do not contain a hash, the duplicates analysis will not work as expected. You can use the Calculate hash accelerator to calculate the hash of a BINARY documents.

After the data is ingested into Elasticsearch you can use Kibana to visualize the data. For more information on how to install and run Kibana, please refer to the Kibana documentation. As part of this accelerator, a pre-configured set of Kibana dashboards is provided in the same folder where you found this accelerator. These dashboards can be imported into Kibana. For more information on how to import dashboards into Kibana, please refer to the Kibana documentation.

A wealth of insights can be obtained, including but not limited to:

  • Insights into the total number and size of the content
  • Insights into the structure of the content
  • Insights into versions and translations
  • Insights into duplicates
  • Insights quality and completeness of metadata
  • Insights into the content lifecycle

Re-running Insights

The first flow can be re-run to re-analyze the data in the Content Store. The second flow can be re-run to re-ingest the data into Elasticsearch. If you want to re-ingest the data into Elasticsearch, it is recommend to delete the index in Elasticsearch first. For more information on how to delete an index in Elasticsearch, please refer to the Elasticsearch documentation. Or use Stack Management > Index Management in Kibana.

Metadata

Quality analysis

In addition to the pre-configured set of dashboards, Kibana offers a feature called Field Statistics that can be used to analyze metadata quality and completeness. For more information on how to use Field Statistics, please refer to the Kibana documentation.

Custom metadata

By default the Insights accelerator is configured to analyze the data in the Content Store metadata. If you want to analyze custom metadata, you can modify the Insights (2. Ingest) flow to include the custom metadata by editing the two Template Engine components in the flow. Store the custom metadata in the metadata key as an object. For example:

{
"operation": ...
"data": {
... // Content Store metadata
"metadata": {
"customMetadata": "customMetadataValue"
}
}
}
note

The Insights accelerator has the following limitations:

  • Only the primary binary of a record is analyzed
  • If a document has multiple parents, only the first parent is analyzed

Elasticsearch data mapping

namemandatorytypedescriptiondefault Content Store mapping
_idyesUnique identifier_id
idyeskeywordId of object or full path if file system (same Id for different versions and languages)source.versionInfo.seriesId
typeyeskeywordType of object, container for objects that can contain children.kind
subTypeyeskeywordSubType to specify a more detailed type, for example folder for containersource.contentType.systemName
isFileyesbooleanSpecifies if an object is/contains a binaryhardcoded
sourceyeskeywordName of the source systemmigration.origin
nameyestextName of the object, think name field or shortName fieldsource.name.systemName
name.keywordautomatically calculatedkeyword
name.lengthautomatically calculatedtoken_count
name*nokeywordAdditional fields that start with the prefix "name" will be automatically indexed as keywords to perform search and aggregations on
descriptionnotextDescription of the object, think description field, longName field or title fieldsource.description
stateyeskeywordOne or more state values, think hidden, archived, deleted, etc. In none provide an empty arraysource.states
hierarchyyestextFull and unique hierarchy to the object (using forward slashes) including the object, the path may consists of the full hierarchy of id's to the object separated by slashes or a file system path (different versions or language copies should be reflected in the hierarchy)source.hierarchies.[0]
hierarchy.keywordautomatically calculatedkeyword
hierarchy.lengthautomatically calculatedtoken_count
hierarchy.depthautomatically calculatedtoken_count
hierarchy.treeautomatically calculatedtextCan be used for special search use cases, please refer to the path hierarchy tokenizer documentation of elasticsearch
hierarchy.treeReversedautomatically calculatedtextCan be used for special search use cases, please refer to the path hierarchy tokenizer documentation of elasticsearch
urlnokeywordContains the full web url in case of an ecm system.
parentyeskeywordFull parent hierarchyParent path of source.hierarchies.[0]
parent.treeautomatically calculatedtextCan be used for aggregation on the structure
parentIdnokeywordUnique id of the parent objectsource.parentIds.[0]
dateCreatedyesdateCreation date of the objectsource.created.date
dateModifiedyesdateLast modified date of the object, if not available set to dateCreatedsource.lastModified.date
dateAccessednodateLast accesses date of the object, if not available set to dateModifiedsource.lastAccessed.date
date*nodateAdditional fields that start with the prefix "date" will be automatically indexed as dates to perform search and aggregations on
principalCreatednotextPrincipal that created the object, think user or group
principalCreated.keywordautomatically calculatedkeyword
principalModifiednotextPrincipal that last modified the object, think user or group
principalModified.keywordautomatically calculatedkeyword
principalAccessednotextPrincipal that last accessed the object, think user or group
principalAccessed.keywordautomatically calculatedkeyword
principalAuthornotextPrincipal that is the author of the object, think user or group
principalAuthor.keywordautomatically calculatedkeyword
principalOwnernotextPrincipal that is the owner of the object, think user or group
principalOwner.keywordautomatically calculatedkeyword
principal*notextAdditional fields that start with the prefix "principal" will be automatically indexed as keywords to perform search and aggregations on
binaryExtensionyes (for files)keywordExtension of the binary, if empty then nullsource.binaries.[0].source.rawExtension
binaryExtension.normalautomatically calculatedkeywordNormalized version of the value
binaryExtension.lengthautomatically calculatedtoken_count
binaryByteSizeyes (for files)longSize in bytes of the binarysource.binaries.[0].source.byteSize
binaryHashnokeywordHash of the binarysource.binaries.[0].source.hash
reversedVersionOrderyesintegerA number specifying the order of versions in a reversion mannerreversedVersionOrder
versionCountyesintegerNumber of versions for the object, including the current versionversionCount
languageyeskeywordSpecifies the language of an object, if not available set to undefinedsource.language.systemName
isOriginalLanguageyesbooleanSpecifies if the object is in the original language (if not it's an translation), if no language is available set to truesource.language.masterReference
metadatanoobjectObject field to store any additional metadata.
analyticsBinaryHashCountautomatically calculatedintegerCalculated field containing the amount of times a hash existssource.binaries.[0].source.properties.insights.hashCount
analyticsBinaryUniqueautomatically calculatedbooleanCalculated field set to true for only one of the objects per hash, otherwise falsesource.binaries.[0].source.properties.insights.binaryUnique
analyticsBinaryParentUniqueautomatically calculatedbooleanCalculated field set to true for only one of the objects per hash sharing the same parent, otherwise falsesource.binaries.[0].source.properties.insights.binaryParentUnique
analyticsHasChildrenautomatically calculatedbooleanCalculated field set to true if the object has child objects, otherwise falsesource.properties.insights.hasChildren
analyticsClassification.*keywordObject that contains the results of the classification process, currently the process categorizes the object bases on the binary and extension and it groups based on the binaryFileSizesource.binaries.[0].source.properties.insights.analyticsClassification
analyticsTranslationCountnointegerCalculated field containing the amount of times an object is translated.
analyticsAvailableTranslationsnokeywordCalculated field containing all available translation for an object
purviewAppliedLabelnokeywordThe label applied by Purviewsource.properties.purview.appliedLabel
purviewInformationTypesnokeywordThe information types found by Purviewsource.properties.purview.informationTypes
migrateyesbooleanWhether the document will be migratedmigration.migrate
migrationIdyeskeywordThe Id in the target system after migrationmigration.id
migrationFailedyesbooleanIndicates of the migration failed for this documentmigration.failed
migrationFailedMessageyestextIndicates the reason of a failed migrationmigration.failedMessage

Flows

Insights (1. Analysis) settings

Performs the analysis on the Content Store and enriches the data with insights

mongoConnection

The Mongo connection string including the database name to connect to.

Insights (2. Ingest) settings

Ingests the data into Elasticsearch.

You can easily change the data that is Ingested by modifying the Document Retrieve component in the flow.

The default query is:

{
"kind": {
"$in": ["CONTAINER", "RECORD"]
},
"source.versionInfo.isCurrent": true
}

You can extend this query to include additional filters. For example:

{
"kind": {
"$in": ["CONTAINER", "RECORD"]
},
"source.versionInfo.isCurrent": true,
"source.contentType.systemName": "myContentType"
}
note

The isCurrent filter is required as the checkbox Include source versions is checked.

elasticsearchConnection

Elasticsearch connection string

Example: http://localhost:9200

elasticsearchUsername

Elasticsearch username

elasticsearchPassword

Elasticsearch password

elasticsearchCertificatePath

Path to the Elasticsearch certificate

Example: C:\certificates\elasticsearch\elasticsearch.crt

mongoConnection

The Mongo connection string including the database name to connect to.

indexName

The name of the index to use in Elasticsearch z