Troubleshooting Sharepoint Performance Issues At Scale

Identifying Performance Bottlenecks

Pinpointing performance bottlenecks is a critical first step when troubleshooting SharePoint farms under heavy load. Monitoring tools provide visibility into which components are under stress by tracking metrics like request rates, response times, CPU and memory usage. Analyzing usage patterns, load tests and scaling user simulations determine where scaling is needed.

Robust monitoring provides actionable data to identify overutilized front-ends, congested databases, or maxed out application pools. Common bottlenecks include:

Overloaded Web Front Ends – too many requests, high CPU or memory
Noisy Neighbors – one site collection monopolizing resources
Chatty Custom Code – excessive calls degrading performance
Poorly Optimized Queries – unindexed columns or tables scanned
Underprovisioned Hardware – bottlenecks from constrained resources

Drilling into hardware utilization metrics reveals undersized or oversubscribed components. Tracking database wait times and disk queue lengths indicates when storage upgrades are warranted. By pinpointing constraints, farms can be right-sized to relieve pressure points.

Using Monitoring Tools to Pinpoint Issues

Robust monitoring provides administrators visibility into farm health and early warning of emerging issues. Native tools like PerfMon and Usage Analytics reveal site collection usage patterns. Third party tools track KPIs like page load times, request throughput and memory consumption. Popular products include:

SolarWinds Web Help Desk – Performance Monitoring dashboards
Quest Foglight for SharePoint – Identify “noisy neighbors”
SPCAF – SharePoint Code Analysis Framework
Microsoft APM – Application Performance Monitoring

By collecting key metrics, monitoring tools detect components exceeding recommended thresholds. Analysis determines if adding web servers, optimizing pages, or upgrading hardware best resolves the constraints. Capacity planning uses trend data to right size farms proactively.

Analyzing Usage Patterns and Load

Understanding usage patterns and load characteristics ensures farms meet user demands. Analyzing trend data determines peak concurrency, seasonal spikes and growth projections. Load testing reveals how sites perform under stress when user levels rise. Common test metrics include:

Page load times
Request throughput
Memory consumption
CPU utilization
Storage response times

Running load tests with scaled user levels identifies when response times deteriorate. Ramping up tests determines maximal capacity before farms throttle traffic. Analysing results detects bottlenecks like chatty custom code, noisy neighbors monopolizing resources, or constrained front-ends. This data informs scaling decisions to maintain performance.

Checking Hardware Utilization

Monitoring infrastructure utilization ensures adequate resources to meet demands. Tracking metrics like CPU, memory, I/O, network and storage use determines when upgrades are necessary. Tools like PerfMon, Nagios and SolarWinds provide insight including:

CPU usage and queue lengths
Available physical memory
Disk capacity and I/O response times
Network traffic and latency
Database connections and pool utilization

Threshold-based alerts trigger when utilization exceeds recommended levels, indicating contended hardware. Reports help model growth requirements to right size farms. Proactively scaling up prevents unexpected bottlenecks as load increases.

Optimizing Queries and Data Access

Optimizing SharePoint data access improves request response times and throughput. Indexing expedites list queries, while caching mechanisms serve recurring data from memory. Tuning long running queries also speeds search and retrieval. Areas to optimize include:

Index Design – optimize columns used in queries
Query Optimization – efficient data access logic
Caching Strategies – cache common queries in memory
Batching – process multiple requests simultaneously

Combined these enhancements significantly improve performance for data intensive operations. Farms handle higher loads and concurrency with optimized data access.

Indexing Lists and Libraries

Creating indexes tuned to query patterns improves list access and search performance. Column selections optimize indexing to match filtration and sorting logic. Covering indexes further boost throughput by including projected columns.

Maintenance schedules rebuild stale indexes from fragmented data. Partitioned indexes divide larger tables for more efficient scans. Horizontal partitioning separates data across servers for parallel access. Before indexing:

Identify unused indexes wasting resources
Analyze slow running queries to improve
Determine optimal index columns to cover queries
Partition indexes for very large tables

Best practices guide index creation to enhance performance. Continued monitoring ensures indexes match evolving usage patterns over time as site collections change.

Tuning SQL Queries

Inefficient SQL queries degrade SharePoint performance. Long running queries monopolize resources denying other requests. Improperly filtered searches scan excess data caching results. Techniques to optimize queries include:

Index optimization – improve seek and scan speeds
Predicate tuning – target only required rows
Data modeling – structure storage for efficiency
Query rewrites – refine logic for conciseness
Analyzing plans – detect issues early

Monitoring query plans identifies poorly performing logic. Updating schemas, adding indexes and optimizing filters accelerates data access. Caching frequently reused query results also enhances performance.

Caching Frequently Accessed Data

Caching mechanisms enhance performance by serving commonly accessed data from low latency memory instead of slower disk storage. Page output, database queries and web service calls can be cached improving response times.

SharePoint offers structured caching options:

Page Output Caching – caches rendered pages
Object Cache – caches database query results
Blob Cache – caches BLOB data and files
HTTP Runtime Cache – proxies external web requests

Third party tools also provide distributed caching capabilities for SharePoint farms. Caching improves performance and scalability by reducing database load.

Scaling Out SharePoint Farms

Scaling out sprawls farms horizontally across new servers instead of vertically upgrading existing nodes. Adding web front ends improves user concurrency while distributing back end services prevents overloading. Approaches include:

Web front end scaling – handle more browser requests
App tier splitting – dedicate services to their own servers
SQL sharding – segment databases across instances
Distributed caching – cache across the farm

Gradual scaled expansions allow farms to incrementally meet rising demands. Horizontal scaling enables absorbing large usage spikes more gracefully.

Adding Web Front Ends

Expanding web front ends improves user concurrency by distributing requests across servers. Adding nodes is preferable to vertically upgrading existing servers which hit incremental limitations. Extra capacity handles usage growth and prevents throttling during traffic spikes while maintaining performance.

Round robin load balancing evenly distributes inbound requests across the web front ends. Performance monitoring ensures no single node gets overloaded as traffic shifts can skew distributions. Preserving slots allows immediately adding servers to absorb unexpected load instead of denying traffic.

Load Balancing Requests

Load balancing mechanisms evenly distribute requests across available servers preventing individual nodes becoming bottlenecks. Inbound requests get routed based on server capacity metrics to maintain optimal spreads. Algorithms account for:

Server availability
Relative node capacity
Current load and concurrency
Geographic distribution needs

Hardware and software load balancers improve user experience by scaling capacity not just improving reliability. Performance monitoring ensures even spreading across web front ends as traffic shifts can cluster requests.

Distributing Services Across Servers

SharePoint farms run multiple services from web front ends to application and indexing roles. Clustering these across dedicated servers prevents noisy neighbors from constraining performance. Considerations include:

Split app tier – own server for memory intensive ops
Dedicate indexers – intensive crawl and query workloads
Individual SQL instances – segment databases
Cache clustering – distributed caching fabric

Isolating resource intensive processes optimizes hardware usage. Multi-role servers suffer when different workloads compete. Scaling roles horizontally improves efficiency and throughput.

Refactoring Custom Code and Solutions

Refactoring techniques optimize custom developed components like workflows, apps and interfaces. Improving algorithm efficiency, reducing synchronous calls and batching operations boosts performance. Considerations include:

Code profiling – target slow logic paths
Optimization algorithms – refine data processing
Caching – store prior results in memory
Asynchronous workflows – prevent call blocking
Consolidate requests – combined operations

Refactoring code enhances performance while preserving functionality. Continual optimization essential as solutions scale and usage evolves introducing new constraints.

Improving Algorithm Efficiency

Inefficient algorithms degrade performance in custom workflows, web parts and other solutions. Code profiling using tools like ULS Viewer highlights hotspots for targeted refinements. Optimizing techniques like memoization and dynamic programming improve processing times.

Algorithm changes balance performance gains against maintainability declines from complexity. Benchmarking quantifies time savings monitoring for regressions. With large datasets or intense calculations, even small logic adjustments provide multiplicative scalability improvements.

Reducing Synchronous Calls

Long running synchronous workflows trigger timeouts and hurt scalability. Converting these to asynchronous logic improves throughput by preventing call blocking while back end processing completes. Message queuing mechanisms like Azure Service Bus Queue scale task distribution.

Latency tolerant operations better fit asynchronous models. Performance layers continue server interactions while workflows independently process payloads as distributed microservices. Reduced contention improves concurrency under load when scaled out.

Batching Operations

Batching combines multiple requests into consolidated singular operations to reduce overheads. Processing larger work units improves throughput optimizing data access and shared calculations. Gains multiply with scale as batch sizes increase.

Grouping achieves batch efficiencies by accumulating requests or payloads until thresholds trigger. Time windows also use intervals like 100 ms to batch. Batching enhances scalability but adds latency reaching thresholds before executing.

Monitoring Performance Improvements

Continuous monitoring verifies enhancements realize expected performance improvements while ensuring no regressions. Comparative benchmarking pre and post optimization quantifies gains. KPIs help validate bottlenecks resolved and improvements scale under projected loads.

Establishing Benchmarks

Benchmarking establishes baseline metrics to quantify optimization impacts using load tests. Scripted user scenarios mimic real browsing patterns hitting key pages. Monitoring tools track vital KPIs like request rates, response times, error frequency, CPU usage and memory consumption before and after changes.

Comparing metrics highlights performance lift while ensuring no degradations. Tests utilize target user volumes across use cases to validate scalability. Optimizations target the most prevalent high resource operations for maximum benefit.

Continued Usage Monitoring

Ongoing monitoring reveals when optimizations no longer match evolving usage patterns. Monthly analysis highlights trend shifts like different page views, seasonal traffic changes or new features requiring more resources. Spotting growing contention early allows proactively countering.

Dashboards track key metrics across time visualizing usage shifts. Automated threshold alerts trigger when thresholds breach indicating rising constraints. Monitoring proactively prevents unexpected bottlenecks as sites scale.

Tuning Based on New Data

Continual optimization leverages latest usage data to maintain efficiency as workloads change. New features alter traffic profiles, seasonal events spike loads and growing user bases increase concurrency.

Revisiting indexing, caching and scaling strategies ensures they target top pages and heaviest operations. Additional web front ends handle monthly user increases without degrading performance. Optimization counters trending issues noted in monitoring.