Sunday, January 23, 2022

Sitecore Cleanup Monitor - Proactively keeping an eye on your Event Queue, History and Publish Queue tables

Standard

Background

There are several horror stories floating around the web about the Event Queue bringing Sitecore down to its knees.

Brian Pedersen
https://briancaos.wordpress.com/2016/08/12/sitecore-event-queue-how-to-clean-it-and-why/
https://briancaos.wordpress.com/2014/10/23/sitecore-eventqueue-deadlocks-how-to-solve-them-and-how-to-avoid-them/

Andy Cohen
https://blog.horizontalintegration.com/2016/02/09/sitecore-eventqueue-strikes-again/

I have experienced trouble myself
http://sitecoreart.martinrayenglish.com/2016/08/diagnosing-content-management-server.html

The Last Straw 

There is a bug in pre 8.1 U3 releases (I am on 8.1 U2) that will cause the Event Queue table in the Core database to be flooded with timestamp data from your Sitecore servers in a scaled environment.

The issue was related to the property:changed event that was being added into the Event Queue. Every 10 seconds each Sitecore Instance would use the SetTimestampForLastProcessing method.

There was no need to inform other instances about the update in last processed stamp of local instance, and Sitecore Support provided me with a patch where they simply used the event disabler to fix the issue.

Here is a copy of the patch for download if you are having this problem: https://www.dropbox.com/s/lpjhil5rf9dri0n/Sitecore.Support.99697.zip?dl=0

After experiencing this and other problems in the past, I decided to take action.

Sitecore Cleanup Monitor Module 

The Event Queue was my initial focus, but per Sitecore's Performance Tuning Guide, in order to keep Sitecore running optimally, we need to keep the Event Queue, History and Publish Queue tables below 1000 rows: https://sdn.sitecore.net/upload/sitecore7/70/cms_tuning_guide_sc70-72-a4.pdf. The reason behind this is due to SQL deadlocking: https://technet.microsoft.com/en-us/library/ms177433(v=sql.105).aspx.

With all this being said, I decided to put together a module that would keep an eye on these key tables.

The module consists of 3 agents that will monitor the Event Queue, Publish Queue and History tables to ensure that they don't exceed a set threshold.



Why would you use it?

In many cases, Sitecore's default cleanup agents just aren't efficient enough in cleaning up these key Sitecore tables.

This module allows you to be proactive instead of reactive, so that you don't have to log into your SQL instance to manually run queries to clean up your tables, usually after the $#!,$h has hit the fan.

How does it work? 

When due, the agent will check the row count of the target table in each database (core, master and web), and if the count is above the set threshold, it will remove the oldest rows, bringing the row count down to the threshold. It won't do anything to tables with row counts that are below the threshold.

You can set how often you want each agent to run, and what you want your threshold / table row count to be. You also don't need to use all three agents. If you only want to monitor the Event Queue for example, simply comment or remove the other agents from the module's config file.

You can monitor it's activity be examining your Sitecore logs. Here is a snapshot example:


Installation and Configuration

Documentation, full source code and package download is available from my GitHub repository: https://github.com/martinrayenglish/Sitecore.Cleanup

The module is available on the Sitecore Marketplace: https://marketplace.sitecore.net/Modules/S/Sitecore_Cleanup_Monitor.aspx

Monday, January 3, 2022

Fix Email Campaign Pausing: Sitecore Email Experience Manager 3.x Retry Data Provider

Standard

Background

My company uses Email Experience Manager (EXM) to send several million emails a day, and we have been facing issues where our large campaigns would pause mid-send.

We have a scaled EXM environment with 2 dedicated dispatch servers, and a separate SQL Server, all with appropriate resources so the hardware was not an issue. We also ensured that databases were kept in tiptop condition (proper maintenance plans with stats being updated), and configurations where optimal for our environment.


The causing of the pausing

After digging in, I discovered that the pausing was caused by SQL deadlocks due to the massive amount of records and CRUD activity on the EXM SQL databases.

Sample Exception:

 ERROR Transaction (Process ID 116) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.  
 Exception: System.Data.SqlClient.SqlException  
 Message: Transaction (Process ID 116) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.  
 Source: .Net SqlClient Data Provider  
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)  
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)  
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)  
   at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)  
   at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)  
   at System.Data.SqlClient.SqlDataReader.Read()  
   at System.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)  
   at System.Data.SqlClient.SqlCommand.ExecuteScalar()  
   at Sitecore.Modules.EmailCampaign.Core.Data.SqlDbEcmDataProvider.CountRecipientsInDispatchQueue(Guid messageId, RecipientQueue[] queueStates)  
   at Sitecore.Modules.EmailCampaign.Core.Gateways.DefaultEcmDataGateway.CountRecipientsInDispatchQueue(Guid messageId, RecipientQueue[] queueStates)  
   at Sitecore.Modules.EmailCampaign.Core.Analytics.MessageStatistics.get_Unprocessed()  
   at Sitecore.Modules.EmailCampaign.Core.Analytics.MessageStatistics.get_Processed()  
   at Sitecore.Modules.EmailCampaign.Core.MessageStateInfo.InitializeSendingState()  
   at Sitecore.Modules.EmailCampaign.Core.MessageStateInfo.InitializeMessageStateInfo()  
   at Sitecore.Modules.EmailCampaign.Factory.GetMessageStateInfo(String messageItemId, String contextLanguage)  
   at Sitecore.EmailCampaign.Server.Services.MessageInfoService.Get(String messageId, String contextLanguage)  
   at Sitecore.EmailCampaign.Server.Controllers.MessageInfo.MessageInfoController.MessageInfo(MessageInfoContext data)  
   

How does this new data provider fix the problem?

The new data provider introduces efficient SQL deadlock handling. When a deadlock is detected, it will wait 5 seconds and then retry the transaction. The code will try to execute a deadlocked transaction 3 times.

Configuration

Defaults are set to wait 5 seconds for the retry, and the max retry attempts is 3. The DelaySeconds and RetryCount settings can be modified to suit your needs.

 <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">  
  <sitecore>  
   <ecmDataProvider defaultProvider="sqlretry">  
    <providers>  
     <clear/>  
     <add name="sqlretry" type="Sitecore.EmailCampaign.RetryDataProvider.RetrySqlDbEcmDataProvider, Sitecore.EmailCampaign.RetryDataProvider" connectionStringName="exm.master">  
      <Logger type="Sitecore.ExM.Framework.Diagnostics.Logger, Sitecore.ExM.Framework" factoryMethod="get_Instance"/>  
      <DelaySeconds>5</DelaySeconds>  
      <RetryCount>3</RetryCount>  
     </add>  
     <add name="sqlbase" type="Sitecore.Modules.EmailCampaign.Core.Data.SqlDbEcmDataProvider, Sitecore.EmailCampaign" connectionStringName="exm.master">  
      <Logger type="Sitecore.ExM.Framework.Diagnostics.Logger, Sitecore.ExM.Framework" factoryMethod="get_Instance"/>  
     </add>  
    </providers>  
   </ecmDataProvider>  
  </sitecore>  
 </configuration>  

Source Code and Documentation

Full source code, documentation and package download is available from my GitHub repository:

https://github.com/martinrayenglish/Sitecore-EXM-3.x-Retry-Data-Provider