Skip to main content
Version: 4.55.0

SharePoint Online source connector

The SharePoint Online source connector consists of multiple flows, which are described below. These flows use the MS Graph API v1 and the SharePoint Online REST API.

Features

  • Exporting content structure
  • Exporting users
  • Exporting groups
  • Exporting user OneDrives
  • Exporting content types
  • Exporting managed metadata

Requirements

For the connector to work, an Azure application with certain permissions is required. An Azure administrator is needed to create this application and to grant permissions.

Graph API

In order to connect to the Graph API an application needs to be registered using the Azure portal. For more information see MS Graph API v1 authentication.

The following Graph API permissions are required using the Application Permissions tab:

  • Sites.Selected for specific sites or Sites.Read.All for the entire tenant
  • User.Read.All
  • Group.Read.All (optional for retrieving groups)
  • TermStore.Read.All (optional for retrieving managed metadata)

When limited site access is given with the Sites.Selected permission, an administrator has to grant access to the sites. For example, this PnP PowerShell command can be used: Grant-PnPAzureADAppSitePermission. Example script using a certificate:

# Azure application with Graph API `Sites.FullControl.All` permission
$clientId = '5f5b92f0-2097-4a14-a152-3e8172951d67'
$certificatePassword = (ConvertTo-SecureString -AsPlainText 'certificate_password' -Force)

# connect to SPO using a certificate
Connect-PnPOnline -Url "https://contoso.sharepoint.com" -ClientId $clientId -CertificatePath 'C:/spo/certificate.pfx' -CertificatePassword $certificatePassword -Tenant 'contoso.onmicrosoft.com'

# grant permission to the site for the Azure application that the connector will use
$SiteURL = "https://contoso.sharepoint.com/sites/test"
Grant-PnPAzureADAppSitePermission -AppId '9114b753-f288-41be-b81c-16afaa7c79ae' -DisplayName 'Xill4' -Site $SiteURL -Permissions 'FullControl'

For more information see Updates on controlling app specific access on specific SharePoint sites (Sites.Selected)

Furthermore, a self-signed certificate will have to be generated, which is then registered in the Azure application.

Creating and Exporting a Self-Signed Certificate

For detailed instructions on creating a self-signed public certificate and exporting it along with its private key, refer to the Microsoft documentation.

Once the public certificate with its private key is exported in a .pfx file, an additional step is required to extract the private key and certificate for use in your flow.

Extracting the Private Key in PowerShell

  1. Navigate to the directory containing the exported .pfx file.
  2. Run the following commands, replacing {certificateName} with the name of your exported .pfx file:
## private key:
openssl pkcs12 -in {certificateName}.pfx -nocerts -out private-key.pem -nodes
## certificate:
openssl pkcs12 -in {certificateName}.pfx -clcerts -nokeys -out certificate.pem
  1. When prompted, enter the password you used during the .pfx file export process. This will generate a private-key.pem file containing the extracted private key. Ensure this file is stored securely.

  2. Upload the exported certificate.pem to the registered application in the Azure Portal: App registrations -> Select App -> Certificates & secrets. The Thumbprint value of the registered certificate is needed by the connector.

SharePoint API

In order to connect to the SharePoint API, permissions need to be granted to the previously registered Azure application. The following SharePoint API permission is required using the Application Permissions tab:

  • Sites.Selected for specific sites or Sites.FullControl.All for the entire tenant

When limited site access is given with the Sites.Selected permission, an administrator has to grant FullControl access to the sites using previously mentioned PnP powershell command: Grant-PnPAzureADAppSitePermission.

REST API

note

This method will be deprecated by Microsoft on April 2nd 2026 and is only required when a client secret is used instead of a certificate.

When managed metadata (Term Store) is used or when permissions have to be set, access to the SharePoint Online REST API is required. For this purpose permissions have to be granted to the previously registered Azure application. This can be done for specific sites on site level or for all sites on tenant level.

On site level

This can be done by going to this page: https://<tenantName>.sharepoint.com/<site/>_layouts/15/appinv.aspx (replace <tenantName> with the name of the tenant and <site/> with the relative site url).

For the App ID field use the client id value of the previously created Azure application. Paste this XML snippet in field App's permission request:

<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest
Scope="http://sharepoint/content/sitecollection"
Right="Read"
/>
</AppPermissionRequests>

On tenant level

This can be done by going to this page: https://<tenantName>-admin.sharepoint.com/_layouts/15/appinv.aspx (replace <tenantName> with the name of the tenant).

For the App ID field use the client id value of the previously created Azure application. Paste this XML snippet in field App's permission request:

<AppPermissionRequests AllowAppOnlyPolicy="true">
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="Read" />
</AppPermissionRequests>

For more information see Granting access using SharePoint App-Only

Rate limit

Every application has its own limits in a tenant, which are based on the number of licenses purchased per organization. In the HTTP request components, the limits are set to make sure throttling is avoided for the lowest amount of license count (0 - 1k), with a safe margin. More info can be found here.

Flows

SharePoint Online (1. Content)

The first flow exports the tenant content and users.

Settings

mongoConnection

The Mongo connection string including the database name to connect to.

tenantID

The ID of the tenant to connect to.

clientID

The client ID of the application.

clientSecret

The client secret of the application.

clientCertificateThumbprint

A thumbprint is a unique identifier for the certificate uploaded to Azure Portal. It ensures the application is using the correct certificate.

clientCertificatePrivateKey

The file contents of the private key associated with the certificate.

note

Authentication can be done using the clientSecret or clientCertificateThumbprint together with clientCertificatePrivateKey. For more information on how to create a certificate, go to Creating and Exporting a Self-Signed Certificate.

rootID

The ID of the root site to start crawling using the following format:

<tenantName>.sharepoint.com,<siteId>,<webId>

To get the siteID and webID use the following urls:

<siteURL>/_api/site/id to get the siteID <siteURL>/_api/web/id to get the webID

Setting root as value will retrieve all sites and underlying content directly under the tenant, sites that are in /sites are excluded. When getAllSites is set to true, rootID will be ignored, the requested rootID will already be retrieved by getting all sites including sub sites.

getAllSites

When set to true all sites and sub sites within the tenant will be retrieved.

note

Microsoft Teams are also retrieved when this setting is enabled.

getOneDrives

When set to true the OneDrives of the users are retrieved.

getGroups

When set to true the groups are retrieved.

getHiddenLists

When set to true the hidden lists are retrieved.

Origin

Specifies the origin of the document in the Content Store.

SharePoint Online (2. Content Types & Term Store)

This flow exports the content types for each stored site and document library in the Content Store. Furthermore, for each site it also exports the term store.

Settings

mongoConnection

The Mongo connection string including the database name to connect to.

tenantID

The ID of the tenant to connect to.

tenantName

The name of the tenant to connect to.

clientID

The client ID of the application.

clientSecret

The client secret of the application.

clientCertificateThumbprint

A thumbprint is a unique identifier for the certificate uploaded to Azure Portal. It ensures the application is using the correct certificate.

clientCertificatePrivateKey

The file contents of the private key associated with the certificate.

Origin

Specifies the origin of the document in the Content Store.

note

Authentication can be done using the clientSecret or clientCertificateThumbprint together with clientCertificatePrivateKey. For more information on how to create a certificate, go to Creating and Exporting a Self-Signed Certificate.

SharePoint Online (3. Permission Levels)

This flow exports all permission levels of each site in the Content Store. It is only required to run this flow when (custom) permissions are set on objects that are migrated.

Settings

mongoConnection

The Mongo connection string including the database name to connect to.

tenantID

The ID of the tenant to connect to.

tenantName

The name of the tenant to connect to.

clientID

The client ID of the application.

clientSecret

The client secret of the application.

clientCertificateThumbprint

A thumbprint is a unique identifier for the certificate uploaded to Azure Portal. It ensures the application is using the correct certificate.

clientCertificatePrivateKey

The file contents of the private key associated with the certificate.

Origin

Specifies the origin of the document in the Content Store.

note

Authentication can be done using the clientSecret or clientCertificateThumbprint together with clientCertificatePrivateKey. For more information on how to create a certificate, go to Creating and Exporting a Self-Signed Certificate.