Background
While working with Sitecore Support to troubleshoot a SQL session issue that we encountered on a high-traffic, scaled, Sitecore environment running Sitecore 8.1 Update 2, we discovered that the root cause of the issue was a connection leaking bug in the SessionStateStoreProvider that causes unnecessary load on SQL server making it unresponsive.The purpose of this post is to arm you with the information that you need to implement a stable SQL Session State in your Sitecore deployment.
Symptoms
When the issue / outage occurred, the exceptions in the Sitecore logs where the following: Exception: System.Data.SqlClient.SqlException
Message: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - No such host is known.)
Source: .Net SqlClient Data Provider
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.Open()
at System.Data.SqlClient.SqlConnection.Open()
at Sitecore.SessionProvider.Sql.SqlSessionStateStore.UpdateItemExpiration(Guid application, String id)
at Sitecore.SessionProvider.Sql.SqlSessionStateProvider.ResetItemTimeout(HttpContext context, String id)
at System.Web.SessionState.SessionStateModule.BeginAcquireState(Object source, EventArgs e, AsyncCallback cb, Object extraData)
at System.Web.HttpApplication.AsyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
Nested Exception
Exception: System.ComponentModel.Win32Exception
Message: No such host is known
Our New Relic Application monitoring system reported that the GetExpiredItemExclusive SQL stored procedure that was running against the session database was the most time consuming and was responsible for the highest throughput between our Sitecore and SQL server.We discovered that once the execution of the stored procedure got above the 10,000 calls per minute range, the application would start having trouble and would eventually stop responding.
Root Cause
It was determined that .NET was naturally creating a number of session state provider objects, and during high traffic periods, the number got so large that it caused too much load on the SQL server and eventually caused the application to stop responding.
Stabilizing Session State
The Patch
Sitecore Support issued us with a patch and noted that the issue was fixed in 8.2 Update 2 on.
From a high level, the change involved Sitecore using their own factory and creating session state objects manually.
The issue was registered as bug #98800. It's important to note that all prior versions will require a ticket to request the patch.
We implemented the patch by following these steps:
1) Put the attached 'Sitecore.Support.98800.dll' assembly to the /bin folder of the website.
2) Changed session state provider type from
type="Sitecore.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.SessionProvider.Sql"
to
type="Sitecore.Support.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.Support.98800"
The
change was made in both the Web.Config and Sitecore.Analytics.Tracking.config
Not Stable Yet
About 3 days later, our site was brought down to it's knees again. New Relic showed that the GetExpiredItemExclusive SQL stored procedure calls were well above the 10,000 calls per minute range.
Configuration Update
Working with Sitecore Support again, we increased the Session Provider polling interval from the default 2 seconds to 60 seconds and also increased the SQL connection timeout to 300 seconds.
The polling interval is basically the number of seconds to check the Session database for expired sessions. Under the covers, this would execute the GetExpiredItemExclusive SQL stored procedure.
The final configurations looked like these:
Sitecore.Analytics.Tracking.config
Sitecore.Analytics.Tracking.config
<add
name="mssql"
type="Sitecore.Support.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.Support.98800"
connectionStringName="session"
pollingInterval="60"
compression="true"
sessionType="shared"/>
Web.Config
<add
name="mssql"
type="Sitecore.Support.SessionProvider.Sql.SqlSessionStateProvider, Sitecore.Support.98800"
connectionStringName="session"
pollingInterval="60"
compression="true"
sessionType="private"/>
ConnectionStrings.config
<add name="session"
connectionString="user id=xxx;password=xxx;Data Source=xxx,1433;Database=Sessions;MultiSubnetFailover=TRUE; Connection Timeout=300" />
Status
With the patch in place, and the final configuration updates, the application has been stable and has survived extremely high traffic days.
An example of one of these days: 40,000 requests per minute, 7500 simultaneous users and 142,000 page views per hour.
Takeaways
If you intend to use SQL Session State for your Sitecore implementation, and are running a version of Sitecore prior to 8.2 Update 2, you need to create a ticket with Sitecore support to request the patch.
After this, it's critical that you increase your polling interval configuration from the default 2 seconds to something higher like we did. 60 seconds seems to be the perfect number.
If you have any questions, feel free to submit a comment and I will help you out to the best of my knowledge about this issue.