Get All Files And Folders Under A Sharepoint Folder In A Single Api Call

The Problem: Needing to Retrieve Many Files

When working with SharePoint folders that contain large numbers of files, a common task is to retrieve metadata or content for all files stored under a parent folder. However, the SharePoint REST APIs are designed for single file access and do not easily facilitate extracting details on all files in one call.

Issuing separate API calls for every file is inefficient and difficult to scale. As the number of files grows into the hundreds or thousands, performance degrades and scripts risk exceeding SharePoint throttling limits.

Therefore, developers need a way to efficiently retrieve all files under a SharePoint folder in a single API request to avoid performance pitfalls.

Querying the SharePoint REST API

SharePoint provides a REST API with endpoints for querying folder and file information. We can construct requests to these endpoints to return metadata on files matching certain criteria.

For example, the following GET request queries for all files under the target folder using the host web URL, folder server-relative URL, and the $select and $expand OData query options:

GET https://example.sharepoint.com/sites/site1/_api/web/GetFolderByServerRelativeUrl('/sites/site1/documents')/Files?$select=Name,ServerRelativeUrl&$expand=ListItemAllFields

This request returns a JSON object with properties for each matching file, including the name, URL, and columns from the associated list item.

Constructing the API Query URL

To understand how to tailor the query for different needs, let’s break down each component:

  • Host Web URL – The URL representing the parent site we want to search under – e.g. https://example.sharepoint.com/sites/site1
  • Folder Server-Relative URL – The URL of the folder to query relative to the host web – e.g. /sites/site1/documents
  • $select – Controls fields returned for each file – e.g. Name, URL
  • $expand – Expand to get metadata from the associated list item

By modifying these parameters, the query can be customized to return only required file properties or extended for more metadata via list item expansion.

Parsing the JSON Response

The SharePoint REST API returns query results in JSON format. The response contains a “value” array with one element per matching file:

{
  "value":[
    {
      "Name":"sales.xlsx",
      "ServerRelativeUrl":"/sites/site1/documents/sales.xlsx",
      "ListItemAllFields": {
        "Title": "Q3 Sales Report", 
        "Modified": "2023-02-01"
      }
    },
    {
      "Name":"forecast.docx",
      "ServerRelativeUrl":"/sites/site1/documents/forecast.docx",    
      "ListItemAllFields": {
        "Title": "2023 Forecast",
        "Modified": "2023-01-15"
      }
    }
  ] 
}

When programmatically handling the response, we loop through the “value” array to access the properties of each file item.

The properties can then be processed or saved as needed – for example, collecting specific metadata like name and modified date into a summary array or report.

Handling Large Numbers of Files

A key benefit of the single folder query is consolidation of the metadata for potentially thousands of child files into one manageable payload. But large responses still present analysis challenges.

For example, attempting to hold all files in memory at once may exceed limits. Or processing time might become unreasonable if iterating over a very large file array.

In cases of 100,000+ files, strategies like streaming parsing, batch processing chunks of files, multithreading, and pooling connections can optimize performance.

Example Code in Python

In Python, we can use the SharePlum library and requests module to send the SharePoint query and parse the response. This example handles large sets of files by batch processing:

import json
import requests
from shareplum import Site
from shareplum.site import Version

# Construct request URL 
site_url = 'https://example.sharepoint.com/sites/site1' 
folder_url = '/sites/site1/documents'
api_url = f"{site_url}/_api/web/GetFolderByServerRelativeUrl('{folder_url}')/Files?\
$select=Name, ServerRelativeURL&$expand=ListItemAllFields"

# Send GET request
resp = requests.get(api_url, auth=auth) 

# Batch process files in chunks of 100
batch_size = 100
for i in range(0, len(resp.json()['value']), batch_size):
  batch = resp.json()['value'][i:i+batch_size] 
  
  for file in batch:
     # Print name and modified date for example
     print(file['Name'], file['ListItemAllFields']['Modified'])

This avoids loading the entire file array into memory. The batch size can be tuned based on available resources.

Example Code in JavaScript

In client-side JavaScript, the SharePoint sp-pnp-js library provides easy access to REST APIs through its SPHttpClient class:

import { sp } from "@pnp/sp";

const folderUrl = "/sites/site1/documents"; 

let files = await sp.web.getFolderByServerRelativeUrl(folderUrl).files
  .select("Name","ServerRelativeUrl","ListItemAllFields/Title","ListItemAllFields/Modified")
  .expand("ListItemAllFields")
  .get();

// Process files	
for(let i = 0; i < files.length; i++) {

  let file = files[i];
  
  // Print name and modified 
  console.log(file.Name, file.ListItemAllFields.Modified); 

}

The select() and expand() methods construct the $select and $expand query options behind the scenes. Pagination can also be added to handle large responses.

Optimizing for Performance

When dealing with folders containing from hundreds of thousands to millions of files, further optimization is required to avoid bottlenecks.

Understanding SharePoint Query Limits

By default, SharePoint limits query responses to 5000 items and request URLs to 2000 characters. Attempting to retrieve millions of files in one call exceeds these thresholds.

The limits can be increased through SharePoint throttling configuration - but there are still performance tradeoffs.

Paginating Requests for Large Folders

Rather than removing limits entirely, a better solution is to paginate requests. By splitting the file query into smaller chunks, we can stay within boundaries:

$top  = 1000  // Files per page
$skip = 0 // Skip 0 files on first page 

// Get first 1000 files
api_url = "...&$top=1000&$skip=0" 

// Get next 1000 files  
$skip = 1000  
api_url = "...&$top=1000&$skip=1000"

Handling 1000 files per call is more efficient than a single batch of 100,000+. Pagination allows processing files in manageable blocks.

Caching Metadata to Minimize Requests

Finally, performance can be improved by caching file metadata locally after the initial retrieval, rather than requesting it from SharePoint every time.

This metadata cache can then be checked on subsequent executions before making API calls to load updated data. By reducing calls up to 70-80%, large workflows can scale efficiently.

Conclusion

Retrieving all files under a SharePoint folder requires careful API query construction and response handling, especially for large sets. Techniques like REST pagination and local metadata caching make it possible to scale robust solutions.

With an understanding of the available endpoints and parameters as outlined here, developers can optimize scripts to extract SharePoint file metadata in a single efficient request.

Leave a Reply

Your email address will not be published. Required fields are marked *