What Information Can You Get From Microsoft To Troubleshoot Sharepoint Online?

Troubleshooting SharePoint Online Issues

When working to resolve problems in SharePoint Online, the first steps are to identify the root cause of the issues and collect diagnostic data to analyze. Review any error messages or codes that appear to gain insights into what system or functionality is failing. Check the Office 365 service health pages for notifications of any ongoing service issues. Try testing the intended site functionality that is broken to reproduce the problem.

Identifying the Root Cause

Pinpointing the origin of SharePoint Online errors and problems is essential for effective troubleshooting. Administrators should thoroughly review verbose error messages and correlation IDs logged in the Unified Logging Service (ULS) logs. Compare error details across locations to identify common threads pointing to a root issue. Check the Office 365 service health pages for your tenant at https://admin.microsoft.com to see if Microsoft has posted any service incidents, updates, or advisories about problems affecting SharePoint Online. Manually test functionality like search, workflows, and permissions that is broken to reproduce issues.

Reviewing error messages and logs

Error messages displayed to users or logged in SharePoint ULS logs provide clues about the root cause of an issue. Error codes have associated knowledge base articles describing causes and resolutions. Administrators should review all available error information in the ULS logs under the Diagnostic Logging Administration page. Compare error strings and correlation IDs across locations to pinpoint common themes. Error data combined with health status and testing helps narrow down where SharePoint Online problems originate.

Checking service health status

The Office 365 service health dashboard displays real-time notices for any ongoing service incidents, updates, or advisories involving SharePoint Online. Administrators should regularly check https://admin.microsoft.com to see if Microsoft has identified issues with the SharePoint Online service in the tenant’s region. The dashboard lists components affected, potential impact, and recommended actions. If Microsoft posts a relevant service issue, it may explain site problems without requiring further SharePoint troubleshooting.

Testing intended functionality

Reproducing the broken processes or capabilities in the SharePoint Online tenant is key to observing failure points. Administrators should try accessing sites, workflows, lists, libraries, and pages exhibiting problems. Perform searches, modify permissions, trigger workflows, or upload documents that fail to validate the issues. Capture ULS logs and correlation IDs during testing. If able to replicate issues reliably through testing, the observed failures better indicate where problems originate in SharePoint’s subsystems.

Collecting Diagnostic Information

SharePoint Online provides diagnostic logging capabilities and PowerShell cmdlets to help collect details about tenant configuration, usage, errors, performance, and operations. Administrators can enable enhanced logging, capture and decode ULS log files, and export reporting data using the SharePoint Online Management Shell. Analyzing these diagnostic logs facilitates identifying factors contributing to problems.

Enabling diagnostic logging

Administrators can enable verbose diagnostic logging in SharePoint Online to log detailed troubleshooting information about usage, throttling events, tenant settings, permissions errors, and other data. In the SharePoint admin center at https://admin.microsoft.com go to Diagnostic Logging > Start Diagnostic Logging. Choose Verbose trace logging under Log Type and select service components like SharePoint Foundation to log. This logs extensive troubleshooting data to the ULS logs for analysis.

Capturing ULS logs

The Unified Logging Service (ULS) logs record diagnostic and error information from across SharePoint Online’s services and applications. Administrators can use Microsoft PowerShell commands like Get-SPOTenantLogEntry to export the ULS log files over the last 14 days containing troubleshooting data. Microsoft also provides the SharePoint Online ULS Log Viewer to decode the log contents containing correlation IDs, timestamps, usage telemetry, and error messages.

Using PowerShell to export reporting data

SharePoint Online Management Shell cmdlets can aggregate and export reporting data on all aspects of the tenant’s usage, health, permissions, storage, throttling limits, and content. To assist troubleshooting, administrators can extract reports on service operations using Get-SPOSite, Get-SPOThrottlingRule, or Get-SPOTenant to check for issues across logs, event counts, and data records. PowerShell reporting provides comprehensive details complementing ULS log analysis.

Analyzing Data to Pinpoint the Problem

By collecting extensive diagnostic logging information and usage reports, SharePoint Online administrators can gain holistic insights into health issues to pinpoint problem areas. Log analysis involves correlating error entries and events across different service applications over time, controlled by unique correlation IDs. Checking correlation IDs can identify requests that fail across multiple services. Analyze usage reports to identify any irregular throttling or traffic patterns driving failures.

Correlating events in the logs

SharePoint ULS log entries contain correlation IDs which uniquely identify each client request processed across services. Administrators should group and sequence log entries by correlation ID to trace flows through the platform. Cross-reference IDs found in permission errors, app crashes, database timeouts, or search failures that indicate components failing that request. Event correlation exposes the chain of system and service faults precipitating user-facing issues.

Identifying error patterns

Analyzing ULS logs over time can reveal error patterns signaling systemic issues needing attention, before they degrade performance. Check for consistent correlation IDs around the same operations failing – like workflows terminating unexpectedly. Plot trends in daily or hourly peaks for particular errors related to traffic loads. Spikes in permissions errors may indicate overextended governance policies. Such error patterns help pinpoint manageable problem areas.

Checking correlation IDs

Correlation IDs present in all SharePoint ULS log entries correspond to unique client requests passing through the platform’s services. When a page view or API call fails for users, administrators should identify the associated correlation ID recorded at the point of failure. Then search logs for that correlation ID to uncover preceding system events leading up to the failure, revealing its source.

Resolving Common Issues

SharePoint Online relies on distributed cloud services and apps working in coordination to deliver functionality. Issues can arise around permissions misconfigurations, overloaded resources, or synchronization problems in particular subsystems. Administrators should assess and remedy platform issues by reviewing common trouble areas – adjusting permissions, handling throttling limits, or troubleshooting synchronization failures between connected systems.

Fixing permission errors

Users frequently encounter access denied permission errors when system policies do not grant the expected rights. Administrators can review user permissions on sites and content libraries in the SharePoint admin center security reports to resolve permission issues. Expand permission levels granted to SharePoint groups and simplify policies by targeting inheritance breaks impacting particular sites or objects.

Handling throttling limits

SharePoint Online enforces throttling limits on traffic to ensure resource availability and combat denial of service attacks. However excessive throttling can degrade performance during usage spikes. Administrators can run the Get-SPOTenant cmdlet to check the current throttling rules and thresholds in place tenant-wide. Adjust throttling criteria or suggest caching mechanisms to ensure limits align with actual usage volumes.

Troubleshooting synchronization problems

SharePoint Online relies on background synchronization across connected data stores and services – like with Exchange for mailbox contacts or Project Online for task status. Errors or latency in synchronization can cause stale data. Administrators should review Office 365 Service health dashboards for notifications and use PowerShell reporting cmdlets to analyze synchronization logs between SharePoint, Exchange, and other services to identify lags or faults.

Escalating Complex Issues

Despite best efforts troubleshooting SharePoint Online issues internally, some complex problems can still require escalating to Microsoft Support when they persist without clear resolutions. Provide Microsoft with all available diagnostic data compiled during internal troubleshooting efforts upfront to help their engineers expedite identifying fixes to underlying system faults.

Engaging Microsoft Support

If administrators cannot resolve SharePoint issues through internal troubleshooting, they can engage Microsoft Support technicians for assistance. Navigate in the Office 365 admin center to the Help pane > Support > New service request to open tier 2, 3 or 4 severity level cases depending on failures’ business impact. Provide Microsoft detailed troubleshooting logs, identify what steps attempted thus far have failed to determine the problem.

Providing troubleshooting details

To help Microsoft Support engineers troubleshoot issues more quickly, SharePoint Online administrators should share all relevant details gathered during internal diagnosis efforts. Provide health check reports, verbose ULS log exports, PowerShell usage reports, screenshot evidence of problems, identifying impacted correlation IDs.Detail all observations noted during testing to give Microsoft a head start addressing the underlying factors.

Granting permissions for debugging

For complex issues requiring direct debugging by Microsoft engineers to uncover hidden factors unidentified in regular telemetry or load testing, administrators may need to temporarily assign elevated Office 365 support permissions. The Microsoft services agreement governing the tenant deployment determines appropriate access levels to grant for remote tracing and log collection by Microsoft personnel addressing stuck cases based on severity.

Verifying the Solution

After Microsoft support engineers provide fixes for SharePoint Online issues, administrators should retest the previously failing functionality under the same conditions that manifested problems originally. Continually monitor Office 365 Service Health notifications and irregular traffic patterns in usage reports that could indicate potential recurrence of errors. Keep troubleshooting until the system runs stably again.

Retesting fixed functionality

Once Microsoft implements proposed resolutions for SharePoint failures, administrators need to directly revalidate the fixes by repeating original test case steps that demonstrated issues. Retest problematic workflows, search functionality, or page rendering using the same accounts and parameters. Only by directly verifying against past failed test results can administrators confirm issues are truly resolved.

Monitoring service health

Keeping watch on Office 365 Service Health notifications is critical for determining if addressed SharePoint Online problems or related issues reappear in the tenant or region. New service incident or advisory posts may signal previous fixes did not fully resolve root causes. Subscribe to health notifications and check service dashboards regularly after closing support cases to catch recurrence.

Adjusting troubleshooting as needed

If SharePoint issues surface again after verifying an initial fix, administrators may need to reopen service requests or retreat through previous troubleshooting steps to uncover missed factors. Enable more verbose diagnostic logging to capture added details around error recurrence. Widen the timeframes for gathering ULS logs and usage reports to determine if traffic or load spikes contribute to issue revival.