SharePoint Online source connector
The SharePoint Online source connector consists of multiple flows, which are described below. These flows use the MS Graph API v1 and the SharePoint Online REST API.
Features
- Exporting content structure
- Exporting users
- Exporting groups
- Exporting user OneDrives
- Exporting content types
- Exporting managed metadata
Requirements
For the connector to work, an Azure application with certain permissions is required. An Azure administrator is needed to create this application and to grant permissions.
Graph API
In order to connect to the Graph API an application needs to be registered using the Azure portal. For more information see MS Graph API v1 authentication.
The following Graph API
permissions are required using the Application Permissions
tab:
Sites.Selected
for specific sites orSites.Read.All
for the entire tenantUser.Read.All
Group.Read.All
(optional for retrieving groups)TermStore.Read.All
(optional for retrieving managed metadata)
When limited site access is given with the Sites.Selected
permission, an administrator has to grant access to the sites. For example, this PnP PowerShell command can be used:
Grant-PnPAzureADAppSitePermission. Example script using a certificate:
# Azure application with Graph API `Sites.FullControl.All` permission
$clientId = '5f5b92f0-2097-4a14-a152-3e8172951d67'
$certificatePassword = (ConvertTo-SecureString -AsPlainText 'certificate_password' -Force)
# connect to SPO using a certificate
Connect-PnPOnline -Url "https://contoso.sharepoint.com" -ClientId $clientId -CertificatePath 'C:/spo/certificate.pfx' -CertificatePassword $certificatePassword -Tenant 'contoso.onmicrosoft.com'
# grant permission to the site for the Azure application that the connector will use
$SiteURL = "https://contoso.sharepoint.com/sites/test"
Grant-PnPAzureADAppSitePermission -AppId '9114b753-f288-41be-b81c-16afaa7c79ae' -DisplayName 'Xill4' -Site $SiteURL -Permissions 'FullControl'
For more information see Updates on controlling app specific access on specific SharePoint sites (Sites.Selected)
Furthermore, a self-signed certificate will have to be generated, which is then registered in the Azure application.
Creating and Exporting a Self-Signed Certificate
For detailed instructions on creating a self-signed public certificate and exporting it along with its private key, refer to the Microsoft documentation.
Once the public certificate with its private key is exported in a .pfx
file, an additional step is required to extract the private key and certificate for use in your flow.
Extracting the Private Key in PowerShell
- Navigate to the directory containing the exported
.pfx
file. - Run the following commands, replacing
{certificateName}
with the name of your exported.pfx
file:
## private key:
openssl pkcs12 -in {certificateName}.pfx -nocerts -out private-key.pem -nodes
## certificate:
openssl pkcs12 -in {certificateName}.pfx -clcerts -nokeys -out certificate.pem
-
When prompted, enter the password you used during the .pfx file export process. This will generate a private-key.pem file containing the extracted private key. Ensure this file is stored securely.
-
Upload the exported certificate.pem to the registered application in the Azure Portal:
App registrations -> Select App -> Certificates & secrets
. TheThumbprint
value of the registered certificate is needed by the connector.
SharePoint API
In order to connect to the SharePoint API, permissions need to be granted to the previously registered Azure application.
The following SharePoint API
permission is required using the Application Permissions
tab:
Sites.Selected
for specific sites orSites.FullControl.All
for the entire tenant
When limited site access is given with the Sites.Selected
permission, an administrator has to grant FullControl
access to the sites using previously mentioned PnP powershell command:
Grant-PnPAzureADAppSitePermission.
REST API
This method will be deprecated by Microsoft on April 2nd 2026 and is only required when a client secret is used instead of a certificate.
When managed metadata (Term Store) is used or when permissions have to be set, access to the SharePoint Online REST API is required. For this purpose permissions have to be granted to the previously registered Azure application. This can be done for specific sites on site level or for all sites on tenant level.
On site level
This can be done by going to this page: https://<tenantName>.sharepoint.com/<site/>_layouts/15/appinv.aspx
(replace <tenantName>
with the name of the tenant and <site/>
with the relative site url).
For the App ID
field use the client id
value of the previously created Azure application. Paste this XML snippet in field App's permission request
:
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest
Scope="http://sharepoint/content/sitecollection"
Right="Read"
/>
</AppPermissionRequests>
On tenant level
This can be done by going to this page: https://<tenantName>-admin.sharepoint.com/_layouts/15/appinv.aspx
(replace <tenantName>
with the name of the tenant).
For the App ID
field use the client id
value of the previously created Azure application. Paste this XML snippet in field App's permission request
:
<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="Read" />
</AppPermissionRequests>
For more information see Granting access using SharePoint App-Only
Rate limit
Every application has its own limits in a tenant, which are based on the number of licenses purchased per organization. In the HTTP request components, the limits are set to make sure throttling is avoided for the lowest amount of license count (0 - 1k), with a safe margin. More info can be found here.
Flows
SharePoint Online (1. Content)
The first flow exports the tenant content and users.
Settings
mongoConnection
The Mongo connection string including the database name to connect to.
tenantID
The ID of the tenant to connect to.
clientID
The client ID of the application.
clientSecret
The client secret of the application.
clientCertificateThumbprint
A thumbprint is a unique identifier for the certificate uploaded to Azure Portal. It ensures the application is using the correct certificate.
clientCertificatePrivateKey
The file contents of the private key associated with the certificate.
Authentication can be done using the clientSecret or clientCertificateThumbprint together with clientCertificatePrivateKey. For more information on how to create a certificate, go to Creating and Exporting a Self-Signed Certificate.
rootID
The ID of the root site to start crawling using the following format:
<tenantName>.sharepoint.com,<siteId>,<webId>
To get the siteID and webID use the following urls:
<siteURL>/_api/site/id
to get the siteID
<siteURL>/_api/web/id
to get the webID
Setting root
as value will retrieve all sites and underlying content directly under the tenant, sites that are in /sites
are excluded.
When getAllSites
is set to true
, rootID
will be ignored, the requested rootID will already be retrieved by getting all sites including sub sites.
getAllSites
When set to true
all sites and sub sites within the tenant will be retrieved.
Microsoft Teams are also retrieved when this setting is enabled.
getOneDrives
When set to true
the OneDrives of the users are retrieved.
getGroups
When set to true
the groups are retrieved.
getHiddenLists
When set to true
the hidden lists are retrieved.
Origin
Specifies the origin of the document in the Content Store.
SharePoint Online (2. Content Types & Term Store)
This flow exports the content types for each stored site and document library in the Content Store. Furthermore, for each site it also exports the term store.
Settings
mongoConnection
The Mongo connection string including the database name to connect to.
tenantID
The ID of the tenant to connect to.
tenantName
The name of the tenant to connect to.
clientID
The client ID of the application.
clientSecret
The client secret of the application.
clientCertificateThumbprint
A thumbprint is a unique identifier for the certificate uploaded to Azure Portal. It ensures the application is using the correct certificate.
clientCertificatePrivateKey
The file contents of the private key associated with the certificate.
Origin
Specifies the origin of the document in the Content Store.
Authentication can be done using the clientSecret or clientCertificateThumbprint together with clientCertificatePrivateKey. For more information on how to create a certificate, go to Creating and Exporting a Self-Signed Certificate.
SharePoint Online (3. Permission Levels)
This flow exports all permission levels of each site in the Content Store. It is only required to run this flow when (custom) permissions are set on objects that are migrated.
Settings
mongoConnection
The Mongo connection string including the database name to connect to.
tenantID
The ID of the tenant to connect to.
tenantName
The name of the tenant to connect to.
clientID
The client ID of the application.
clientSecret
The client secret of the application.
clientCertificateThumbprint
A thumbprint is a unique identifier for the certificate uploaded to Azure Portal. It ensures the application is using the correct certificate.
clientCertificatePrivateKey
The file contents of the private key associated with the certificate.
Origin
Specifies the origin of the document in the Content Store.
Authentication can be done using the clientSecret or clientCertificateThumbprint together with clientCertificatePrivateKey. For more information on how to create a certificate, go to Creating and Exporting a Self-Signed Certificate.