Insights
The Insights accelerator is a collection of flows that can be used to analyze the data in the Content Store. To use the Insights accelerator, you need to have Elasticsearch installed and running. For more information on how to install and run Elasticsearch, please refer to the Elasticsearch documentation.
Requires Elasticsearch 8.8.1 or and it is recommended to run the Elasticsearch instance on a separate machine.
The first flow Insights (1. Analysis)
is the flow that performs the analysis on the Content Store and enriches the data with insights. The second flow Insights (2. Ingest)
is the flow that ingests the data into Elasticsearch.
One of the supported analysis, is the analysis of duplicates. In order to calculate the duplicates, the BINARY documents in the Content Store must contain a hash. If the BINARY documents do not contain a hash, the duplicates analysis will not work as expected. You can use the Calculate hash accelerator to calculate the hash of a BINARY documents.
After the data is ingested into Elasticsearch you can use Kibana to visualize the data. For more information on how to install and run Kibana, please refer to the Kibana documentation. As part of this accelerator, a pre-configured set of Kibana dashboards is provided in the same folder where you found this accelerator. These dashboards can be imported into Kibana. For more information on how to import dashboards into Kibana, please refer to the Kibana documentation.
A wealth of insights can be obtained, including but not limited to:
- Insights into the total number and size of the content
- Insights into the structure of the content
- Insights into versions and translations
- Insights into duplicates
- Insights quality and completeness of metadata
- Insights into the content lifecycle
Re-running Insights
The first flow can be re-run to re-analyze the data in the Content Store. The second flow can be re-run to re-ingest the data into Elasticsearch. If you want to re-ingest the data into Elasticsearch, it is recommend to delete the index in Elasticsearch first. For more information on how to delete an index in Elasticsearch, please refer to the Elasticsearch documentation. Or use Stack Management > Index Management
in Kibana.
Metadata
Quality analysis
In addition to the pre-configured set of dashboards, Kibana offers a feature called Field Statistics
that can be used to analyze metadata quality and completeness. For more information on how to use Field Statistics
, please refer to the Kibana documentation.
Custom metadata
By default the Insights accelerator is configured to analyze the data in the Content Store
metadata. If you want to analyze custom metadata, you can modify the Insights (2. Ingest)
flow to include the custom metadata by editing the two Template Engine
components in the flow. Store the custom metadata in the metadata
key as an object. For example:
{
"operation": ...
"data": {
... // Content Store metadata
"metadata": {
"customMetadata": "customMetadataValue"
}
}
}
The Insights accelerator has the following limitations:
- Only the primary binary of a record is analyzed
- If a document has multiple parents, only the first parent is analyzed
Elasticsearch data mapping
name | mandatory | type | description | default Content Store mapping |
---|---|---|---|---|
_id | yes | Unique identifier | _id | |
id | yes | keyword | Id of object or full path if file system (same Id for different versions and languages) | source.versionInfo.seriesId |
type | yes | keyword | Type of object, container for objects that can contain children. | kind |
subType | yes | keyword | SubType to specify a more detailed type, for example folder for container | source.contentType.systemName |
isFile | yes | boolean | Specifies if an object is/contains a binary | hardcoded |
source | yes | keyword | Name of the source system | migration.origin |
name | yes | text | Name of the object, think name field or shortName field | source.name.systemName |
name.keyword | automatically calculated | keyword | ||
name.length | automatically calculated | token_count | ||
name* | no | keyword | Additional fields that start with the prefix "name" will be automatically indexed as keywords to perform search and aggregations on | |
description | no | text | Description of the object, think description field, longName field or title field | source.description |
state | yes | keyword | One or more state values, think hidden, archived, deleted, etc. In none provide an empty array | source.states |
hierarchy | yes | text | Full and unique hierarchy to the object (using forward slashes) including the object, the path may consists of the full hierarchy of id's to the object separated by slashes or a file system path (different versions or language copies should be reflected in the hierarchy) | source.hierarchies.[0] |
hierarchy.keyword | automatically calculated | keyword | ||
hierarchy.length | automatically calculated | token_count | ||
hierarchy.depth | automatically calculated | token_count | ||
hierarchy.tree | automatically calculated | text | Can be used for special search use cases, please refer to the path hierarchy tokenizer documentation of elasticsearch | |
hierarchy.treeReversed | automatically calculated | text | Can be used for special search use cases, please refer to the path hierarchy tokenizer documentation of elasticsearch | |
url | no | keyword | Contains the full web url in case of an ecm system. | |
parent | yes | keyword | Full parent hierarchy | Parent path of source.hierarchies.[0] |
parent.tree | automatically calculated | text | Can be used for aggregation on the structure | |
parentId | no | keyword | Unique id of the parent object | source.parentIds.[0] |
dateCreated | yes | date | Creation date of the object | source.created.date |
dateModified | yes | date | Last modified date of the object, if not available set to dateCreated | source.lastModified.date |
dateAccessed | no | date | Last accesses date of the object, if not available set to dateModified | source.lastAccessed.date |
date* | no | date | Additional fields that start with the prefix "date" will be automatically indexed as dates to perform search and aggregations on | |
principalCreated | no | text | Principal that created the object, think user or group | |
principalCreated.keyword | automatically calculated | keyword | ||
principalModified | no | text | Principal that last modified the object, think user or group | |
principalModified.keyword | automatically calculated | keyword | ||
principalAccessed | no | text | Principal that last accessed the object, think user or group | |
principalAccessed.keyword | automatically calculated | keyword | ||
principalAuthor | no | text | Principal that is the author of the object, think user or group | |
principalAuthor.keyword | automatically calculated | keyword | ||
principalOwner | no | text | Principal that is the owner of the object, think user or group | |
principalOwner.keyword | automatically calculated | keyword | ||
principal* | no | text | Additional fields that start with the prefix "principal" will be automatically indexed as keywords to perform search and aggregations on | |
binaryExtension | yes (for files) | keyword | Extension of the binary, if empty then null | source.binaries.[0].source.rawExtension |
binaryExtension.normal | automatically calculated | keyword | Normalized version of the value | |
binaryExtension.length | automatically calculated | token_count | ||
binaryByteSize | yes (for files) | long | Size in bytes of the binary | source.binaries.[0].source.byteSize |
binaryHash | no | keyword | Hash of the binary | source.binaries.[0].source.hash |
reversedVersionOrder | yes | integer | A number specifying the order of versions in a reversion manner | reversedVersionOrder |
versionCount | yes | integer | Number of versions for the object, including the current version | versionCount |
language | yes | keyword | Specifies the language of an object, if not available set to undefined | source.language.systemName |
isOriginalLanguage | yes | boolean | Specifies if the object is in the original language (if not it's an translation), if no language is available set to true | source.language.masterReference |
metadata | no | object | Object field to store any additional metadata. | |
analyticsBinaryHashCount | automatically calculated | integer | Calculated field containing the amount of times a hash exists | source.binaries.[0].source.properties.insights.hashCount |
analyticsBinaryUnique | automatically calculated | boolean | Calculated field set to true for only one of the objects per hash, otherwise false | source.binaries.[0].source.properties.insights.binaryUnique |
analyticsBinaryParentUnique | automatically calculated | boolean | Calculated field set to true for only one of the objects per hash sharing the same parent, otherwise false | source.binaries.[0].source.properties.insights.binaryParentUnique |
analyticsHasChildren | automatically calculated | boolean | Calculated field set to true if the object has child objects, otherwise false | source.properties.insights.hasChildren |
analyticsClassification.* | keyword | Object that contains the results of the classification process, currently the process categorizes the object bases on the binary and extension and it groups based on the binaryFileSize | source.binaries.[0].source.properties.insights.analyticsClassification | |
analyticsTranslationCount | no | integer | Calculated field containing the amount of times an object is translated. | |
analyticsAvailableTranslations | no | keyword | Calculated field containing all available translation for an object | |
purviewAppliedLabel | no | keyword | The label applied by Purview | source.properties.purview.appliedLabel |
purviewInformationTypes | no | keyword | The information types found by Purview | source.properties.purview.informationTypes |
migrate | yes | boolean | Whether the document will be migrated | migration.migrate |
migrationId | yes | keyword | The Id in the target system after migration | migration.id |
migrationFailed | yes | boolean | Indicates of the migration failed for this document | migration.failed |
migrationFailedMessage | yes | text | Indicates the reason of a failed migration | migration.failedMessage |
Flows
Insights (1. Analysis) settings
Performs the analysis on the Content Store and enriches the data with insights
mongoConnection
The Mongo connection string including the database name to connect to.
Insights (2. Ingest) settings
Ingests the data into Elasticsearch.
You can easily change the data that is Ingested by modifying the Document Retrieve
component in the flow.
The default query is:
{
"kind": {
"$in": ["CONTAINER", "RECORD"]
},
"source.versionInfo.isCurrent": true
}
You can extend this query to include additional filters. For example:
{
"kind": {
"$in": ["CONTAINER", "RECORD"]
},
"source.versionInfo.isCurrent": true,
"source.contentType.systemName": "myContentType"
}
The isCurrent
filter is required as the checkbox Include source versions
is checked.
elasticsearchConnection
Elasticsearch connection string
Example: http://localhost:9200
elasticsearchUsername
Elasticsearch username
elasticsearchPassword
Elasticsearch password
elasticsearchCertificatePath
Path to the Elasticsearch certificate
Example: C:\certificates\elasticsearch\elasticsearch.crt
mongoConnection
The Mongo connection string including the database name to connect to.
indexName
The name of the index to use in Elasticsearch z