Skip to main content
Version: 4.48.0

Document Aggregate

The Document Aggregate component allows you to query for documents using an aggregation pipeline.

The aggregation is executed on the ‘documents’ collection.

An aggregation pipeline consists of one or more stages that process documents:

  • Each stage performs an operation on the collection documents. For example, a stage can filter documents, group documents, and calculate values.
  • The documents that are output from a stage are passed to the next stage.
  • An aggregation pipeline can return results for groups of documents. For example, return the total, average, maximum, and minimum values.
  • Dynamic values in the pipeline are allowed using the dot-notation(see example below).

Stages

$addFields: Adds new fields to documents. Similar to $project, $addFields reshapes each document in the stream; specifically, by adding new fields to output documents that contain both the existing fields from the input documents and the newly added fields. $setis an alias for$addFields.

$bucket: Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries.

$bucketAuto: Categorizes incoming documents into a specific number of groups, called buckets, based on a specified expression. Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.

$collStats: Returns statistics regarding a collection or view.

$count: Returns a count of the number of documents at this stage of the aggregation pipeline. Distinct from the $count aggregation accumulator.

$facet: Processes multiple aggregation pipelines within a single stage on the same set of input documents. Enables the creation of multi-faceted aggregations capable of characterizing data across multiple dimensions, or facets, in a single stage.

$geoNear: Returns an ordered stream of documents based on the proximity to a geospatial point. Incorporates the functionality of $match, $sort, and $limit for geospatial data. The output documents include an additional distance field and can include a location identifier field.

$graphLookup: Performs a recursive search on a collection. To each output document, adds a new array field that contains the traversal results of the recursive search for that document.

$group: Groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. Consumes all input documents and outputs one document per each distinct group. The output documents only contain the identifier field and, if specified, accumulated fields.

$indexStats: Returns statistics regarding the use of each index for the collection.

$limit: Passes the first n documents unmodified to the pipeline where n is the specified limit. For each input document, outputs either one document (for the first n documents) or zero documents (after the first n documents).

$listSessions: Lists all sessions that have been active long enough to propagate to the system.sessions collection.

$lookup: Performs a left outer join to another collection in the same database to filter in documents from the "joined" collection for processing.

$match: Filters the document stream to allow only matching documents to pass unmodified into the next pipeline stage. $match uses standard MongoDB queries. For each input document, outputs either one document (a match) or zero documents (no match).

$merge: Writes the resulting documents of the aggregation pipeline to a collection. The stage can incorporate (insert new documents, merge documents, replace documents, keep existing documents, fail the operation, process documents with a custom update pipeline) the results into an output collection. To use the $merge stage, it must be the last stage in the pipeline.

$out: Writes the resulting documents of the aggregation pipeline to a collection. To use the $out stage, it must be the last stage in the pipeline.

$planCacheStats: Returns plan cache information for a collection.

$project: Reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, outputs one document. See also $unset for removing existing fields.

$redact: Reshapes each document in the stream by restricting the content for each document based on information stored in the documents themselves. Incorporates the functionality of $project and $match. Can be used to implement field level redaction. For each input document, outputs either one or zero documents.

$replaceRoot: Replaces a document with the specified embedded document. The operation replaces all existing fields in the input document, including the _id field. Specify a document embedded in the input document to promote the embedded document to the top level.

$replaceWith: Replaces a document with the specified embedded document. The operation replaces all existing fields in the input document, including the _id field. Specify a document embedded in the input document to promote the embedded document to the top level. $replaceWith is an alias for $replaceRoot stage.

$sample: Randomly selects the specified number of documents from its input.

$search: Performs a full-text search of the field or fields in an Atlas collection. NOTE: $search is only available for MongoDB Atlas clusters, and is not available for self-managed deployments.

$set: Adds new fields to documents. Similar to $project, $set reshapes each document in the stream; specifically, by adding new fields to output documents that contain both the existing fields from the input documents and the newly added fields. $setis an alias for$addFields stage.

$setWindowFields: Groups documents into windows and applies one or more operators to the documents in each window.

$skip: Skips the first n documents where n is the specified skip number and passes the remaining documents unmodified to the pipeline. For each input document, outputs either zero documents (for the first n documents) or one document (if after the first n documents).

$sort: Reorders the document stream by a specified sort key. Only the order changes; the documents remain unmodified. For each input document, outputs one document.

$sortByCount: Groups incoming documents based on the value of a specified expression, then computes the count of documents in each distinct group.

$unionWith: Performs a union of two collections; i.e. combines pipeline results from two collections into a single result set.

$unset: Removes/excludes fields from documents. $unset is an alias for $project stage that removes fields.

$unwind: Deconstructs an array field from the input documents to output a document for each element. Each output document replaces the array with an element value. For each input document, outputs n documents where n is the number of array elements and can be zero for an empty array.

For All Aggregation Pipeline Operators check the documentation from MongoDB.

Handlebars

This component lets you use the Handlebars templates. More information about Handlebars can be found in this section

Configuration

Connection string

A MongoDB connection string.

Example: mongodb://<username>:<password>@localhost:27017/<databaseName>

Here <databaseName> is the database to store content.

Use TLS

Whether or not to use TLS in case your mongoDB requires TLS.

Allow Invalid Certificates

Checking this will disable certificate validation. Warning: specifying this option in a production environment makes your application insecure and potentially vulnerable to expired certificates and to foreign processes posing as valid client instances.

Certificate Authority File

One or more certificate authorities to trust when making a TLS connection. In order to access the local filesystem, the XILL4_WORKDIRS environment variable must be set to the path of the directory to be accessed.

Example: .\ca.pem

Aggregation

Aggregation pipeline. When left empty the aggregation will be translated to an empty array which results in all documents from the collection.

Example: [{"$match": { "name":"John" }}]

Enable allowDiskUse

By enabling allowDiskUse, MongoDB can process the sort operation even if it requires more than 100 megabytes of system memory. If this option is disabled and the operation required more than 100 megabytes of system memory, MongoDB will return an error: Executor error during find command :: caused by :: Sort exceeded memory limit

Enable case insensitive collation settings By default Mongo sorts upper case characters before lower case characters. By enabling case insensitive collation settings, this behavior will be disabled.

Note that this feature will decrease the performance of the query and should only be used when case insensitivity matters

Rate limit

The rate limit settings are used to throttle the number of documents that are sent into the flow. The minimum interval is set at 10 milliseconds. The minimum batch size is 1 outgoing message per interval.

Batch size

Allows you to configure the batch size.

Interval

The interval in milliseconds in which the batches are sent.

Inputs

Example using stages $match and $group to calculate the number of files and total size for the corresponding binary documents.

incoming message:

{
kind: "BINARY"
}

Dynamic values can be used in the aggregation, referenced with {{key}} as in the example below with kind.

Configuration aggregation:

[
{
"$match": {
"kind": "{{kind}}"
}
},
{
"$group": {
"_id": "$source.extension",
"count": { "$sum": 1 },
"totalSize": { "$sum": "$source.byteSize" }
}
}
]

Outputs

Example outputs using input from above considering the database holds 3 binary documents with JSON or zip as "source.extension".

First output:

{
"value": {},
"result": {
"_id": "json",
"count": 2,
"totalSize": 191012968
}
}

Second output:

{
"value": {},
"result": {
"_id": "zip",
"count": 1,
"totalSize": 751
}
}