Box.com source connector
The Box.com connector uses the Box.com API. It consists of multiple flows, which are described below.
Features
- Exporting content structure metadata (files and folders including versions)
- Extracting custom metadata
- Downloading binaries
- Extracting collaborations
Flows
Box.com (Single user)
This flow can export a single user from a Box.com Enterprise environment. It can not be used to extract multiple users. If this is tried, the exported data will be invalid.
This flow uses impersonation to export a single user. The user ID of the user to export is configured in the ImpersonationUserId
setting. It is required to enable Perform Actions as Users
in the Box.com configuration.
Box.com (1. Extract users)
This flow will extract all users from a Box.com Enterprise environment.
Box.com (2. Extract root folders)
This flow will extract all root folders from a Box.com Enterprise environment. It will use the users extracted in the previous flow.
This flow uses impersonation. It is required to enable Perform Actions as Users
in the Box.com configuration.
Box.com (3. Traverse roots)
This flow will traverse all root folders from a Box.com Enterprise environment. It will use the root folders extracted in the previous flow.
This flow uses impersonation. It is required to enable Perform Actions as Users
in the Box.com configuration.
For performance reasons, this flow is controlled by the Load balancer
accelerator flow.
Set the objectsToAssignQuery
to the query below and make sure to configure the availableWorkers
to match the workerIds available for this flow.
{
"kind":"CONTAINER",
"source.properties.isOwner":true,
"source.parentIds.0":{"$exists":false}
}
Set workerThreshold
to 1. This will make sure that the load balancer will assign one root folder to each worker.
Box.com (4. Set ACL)
This flow will link the ACL documents (collaborations) to the CONTAINER and RECORD documents in the Content Store.
Box.com (5. Download binaries)
This flow will download all binaries from a Box.com Enterprise environment. It will use BINARY documents extracted in the previous flows.
This flow uses the GCM scope. Make sure it is enabled in the Box.com configuration.
For performance reasons, this flow is controlled by the Load balancer
accelerator flow.
Set the objectsToAssignQuery
to the query below and make sure to configure the availableWorkers
to match the workerIds available for this flow.
{
"kind" : "BINARY",
"source.externalReference" : {"$exists" : true},
"source.localReference" : {"$exists" : false},
}
Set workerThreshold
to 100. This will make sure that the load balancer will assign a hundred BINARY documents to each worker.
Settings
This section describes the settings for the Box.com connector flows. Not all settings are used in all flows.
mongoConnection
The Mongo connection string including the database name to connect to.
boxClientId
BoxClientId as part of the Box.com JSON configuration.
boxClientSecret
BoxClientSecret as part of the Box.com JSON configuration.
boxPublicKeyId
BoxPublicKeyId as part of the Box.com JSON configuration.
boxPrivateKey
BoxPrivateKey as part of the Box.com JSON configuration.
boxPassphrase
BoxPassphrase as part of the Box.com JSON configuration.
boxEnterpriseId
BoxEnterpriseId as part of the Box.com JSON configuration.
Note The Box.com connector uses the Box.com JWT auth mechanism. Please refer to the Box.com documentation for setting this up.
origin
Specifies the origin of the document in the Content Store.
impersonationUserId
The ID of the user to impersonate.
rootIds
A list of comma-separated rootIds.
Example: 162658461779, 162661914619, 162661534725
workerId
An identifier for this worker. Use this to identify the worker in the Load balancer
accelerator flow.