Saturday, September 21, 2019

Sitecore xDB - Troubleshooting xDB Index Rebuilds on Azure

Standard
In my previous post, I shared some important tips to help ensure that if you are faced with an xDB index rebuild, you can get it done successfully and as quickly as possible.

I mentioned a lot of things in the post, but now, I want to mention common reasons where and why things can go wrong, and highlight the most critical items that impact the rebuild speed and stability.


Causes of Need To Rebuild xDB Index

Your xDB relies on your shard database's SQL Server change tracking feature in order to ensure that it stays in sync. This basically determines how long changes are stored in SQL. As mentioned in Sitecore's docs, the Retention Period setting is set to 5 days for each collection shard. 

So, why would 5-day old data not be indexed in time?
  • The Search Indexer is shut down for too long
  • Live indexing is stuck for too long
  • Live indexing falls too far behind

Causes of Indexing Being Stuck or Falling Behind, and Rebuild Failures

High Resource Utilization: Collection Shards 
99% of the time, this is due to high resource utilization on your shard databases. Basically, if you see your shard databases hitting above 80% DTUs, you will run into this problem.

High Resource Utilization: Azure Search or Solr
If you have a lot of data, you need to scale your Azure Search Service or Solr instance.  Sharding is the answer, and I will touch in this further down.

What to check?

If you are on Azure, make sure your xConnect Search Indexer WebJob is running.
Most importantly, check your xConnect Search Indexer logs for SQL timeouts. 

On Azure, the Webjob logs are found in this location: D:\local\Temp\jobs\continuous\IndexWorker\{randomjobname}\App_data\Logs"

Key Ingredients For Rebuild Indexing Speed and Stability

SQL Collection Shards

Database Health 

Maintaining the database indexes and statistics is critically important. As I mentioned in my previous post:  "Optimize, optimize, optimize your shard databases!!!" 

If you are preparing for a rebuild, make sure that you run the AzureSQLMaintenance Stored Procedure on all of your shard databases.

Database Size

The amount of data and the number of collection shards is directly related to resource utilization and rebuild speed and stability. 

Unfortunately, there is no supported way to "reshard" your databases after the fact. We are hoping this will be a feature that is added to a future Sitecore release.

xDB Search Index

Similarly to the collection shards, the amount of data and the number of shards is directly related to resource utilization on both Azure Search and Solr. 

Specifically on Solr, you will see high JVM heap utilization.

If your rebuilds are slowing down or failing, or even if search performance on your xDB index is deteriorating, it's most likely due to the amount of data in your index, the number of shards and distribution amongst nodes that you have set up.  

Search index sharding strategies can be pretty complex, and I might touch on in these in a later post.

Reduce Your Indexer Batch Size

Another item that I mentioned in my previous post. If you drop this down from 1000 to 500 and you are still having trouble, reduce it even further. 

I have dropped the batch size to 250 on large databases to reduce the chance of timeouts (default is 30 seconds) when the indexer is reading contacts and interactions from the collection shards.


Saturday, August 31, 2019

Sitecore xDB - Optimizing Your xDB Index Rebuild For Speed

Standard
Having performed Sitecore xDB index rebuilds many times with large data sets, I wanted to share some key tips to ensure a successful and speedy rebuild.

My experience has been on Azure PaaS, with both Azure Search and SolrCloud, but these techniques can be applied to on-premise and IaaS as well.

Change The Log Level

Before starting a large rebuild job, it's important to enable the proper logging in case you need to investigate any issues that may arise. By default, the log level is set to Warning. Change it to Information.

Navigate to the: App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\CoreServices\sc.Serilog.xml file, and change the MinimumLevel to Information: 
   <MinimumLevel>
      <Default>Information</Default>
    </MinimumLevel>

Optimize Your Indexer Batch Size

Don't be over eager with your indexer's batch size setting. This setting determines how many contacts or interactions are loaded per parallel stream during an index rebuild. This setting is found in the following location:
App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\sc.Xdb.Collection.IndexerSettings.xml file:
<BatchSize>1000</BatchSize>

I have had success reducing this size to 500, as it helps execution go faster and prevents you from hitting timeouts during your rebuild when your shard databases are under heavy load.

Reduce Your SplitRecordsThreshold

A good tip from a member of our Sitecore community. By default, this value is set to 25000. Tweaking and reducing this value can also make your rebuilds run faster, and improve your live indexing.

Like the Indexer Batch size, this setting is found in the following location:
App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\sc.Xdb.Collection.IndexerSettings.xml
<SplitRecordsThreshold>25000</SplitRecordsThreshold>

I have had success reducing this size to a little less than half the original value, 12000.

Reduce Your Index Writer's ParallelizationDegree

To decrease load in your Search Service, another good suggestion is to decrease the PrallelizationDegree Setting which is 4 by default.

You will see this in many of the other configs in your xConnect App Services. This setting determines how many parallel streams of data can be processed at the same time.

This setting is found in this location for Solr:
\App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\ sc.Xdb.Collection.IndexWriter.SOLR.xml
 <ParallelizationDegree>4</ParallelizationDegree>​

This setting is found in this location for Azure Search:
\App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\ sc.Xdb.Collection.IndexWriter.AzureSearch.xml
 <ParallelizationDegree>4</ParallelizationDegree>​

I have seen large improvements reducing this value from 4 to 1.

Optimize Your Databases

Optimize, optimize, optimize your shard databases!!!

If you are not already using the AzureSQLMaintenance Stored Procedure on your Sitecore databases, do it today!

This is critically important for not only your shard databases but also your other Sitecore databases like Core, Master and Web.  Marketing Automation and Reference databases also get hammered pretty hard, so make sure that this gets applied and run regularly on these.

Note that this maintenance is 100% necessary. As Grant Killian says: "The expectation that Azure SQL is a 'fully managed solution' is somewhat misleading, as rebuilding query stats and defragmentation are the user’s responsibility."

Sitecore recommends an approach like this: https://techcommunity.microsoft.com/t5/Azure-Database-Support-Blog/Automating-Azure-SQL-DB-index-and-statistics-maintenance-using/ba-p/368974

Schedule an Azure Automation “Runbook” to attend to this after hours for all Sitecore databases.

Run the Rebuild

After all this things have been completed, run the xDB rebuild as Sitecore's docs mentions.  In Kudu, go to site\wwwroot\App_data\jobs\continuous\IndexWorker and execute this command:

.\XConnectSearchIndexer.exe -rr

After this, you will see the magic start with docs in your inactive / rebuild index first be reset to 0, and then counts start gradually increasing.

Unfortunately, there is no way to see how far along you are. There is however a query that you can run against your Azure Search or Solr indexes to check the status.

Azure Search Query
$filter=id eq 'xdb-rebuild-status'&$select=id,rebuildstate

Solr Query
id:"xdb-rebuild-status"

Returned Index Rebuild Status:
Default = 0
RebuildRequested = 1
Starting = 2
RebuildingExistingData = 3
RebuildingIncomingChanges = 4
Finishing = 5
Finished = 6

Rebuild Success

When you run the status query, and you see a "6" (sometimes you will see "5" returned, and that's ok too),  you have a successfully completed the rebuild, and new data will start flowing into your xDB index.

Friday, August 9, 2019

Demystifying Pools and Threads to Optimize and Troubleshoot Your Sitecore Application

Standard

Background

If you are .NET application developer that works on Sitecore or not, it is important to have an understanding of how the Microsoft .NET Common Language Runtime (CLR) Thread Pool works will help you determine how to configure your application for optimal performance and help you troubleshoot issues that may present themselves in high traffic production environments.

This topic has been of great interest to me, and it's understanding has helped me troubleshoot and solve many difficult problems within the Sitecore realm.

I am hoping that this post helps other fellow Sitecore developers who may not be as familiar with the inner workings of the .NET CLR and Thread Pool, to have a starting pointing to understand where potential threading issues may occur if the application you support shows symptoms similar to what I intend to discuss.



Thread Pool and Threads 

To put it simply, a thread pool is a group of warmed up threads that are ready to be assigned work to process. 

The CLR Thread Pool contains 2 types of threads that have different roles.

1) Worker Threads 

Worker threads are threads that process HTTP requests that come into your web server - basically they handle and process your application's logic. 

2) Input/Output (I/O) Completion Port or IOCP Threads 

These threads handle communication from your application's code to a network type resource, like a database or web service.

There is really no technical difference between worker threads and IOCP threads. The CLR Thread Pool keeps separate pools of each simply to avoid a situation where high demand on worker threads exhausts all the threads available to dispatch native I/O callbacks, potentially leading to a deadlock. However, this can still occur under certain circumstances.

Out of the Box / Default Thread Pool Thread Counts 

Minimums 

By default, the number of Worker and IOCP threads that your Thread Pool will have ready for work is determined by the number of processors your server has.

Min Formula: Processor Count =  Thread Pool Worker Threads = Thread Pool IOCP Threads

Example: If you have a server with 8 CPUs, you will start with only 8 worker and 8 IOCP threads.

Maximums 

By default, the maximum number of Worker and IOCP threads is 20 per processor.

Max Formula: Processor Count * 20 =  Max Thread Pool Worker Threads = Max Thread Pool IOCP Threads

Example:  If you have a server with 8 CPUs, the default max worker and IOCP threads will be 20 x 8 = 160.

Safety Switch 

The Thread Pool WILL NOT inject new threads when the CPU usage is above 80%. This is a safely mechanism to prevent overloading the CPU.

The Thread Pool In Action

As requests come into your web server, the Thread Pool will inject new worker or I/O completion threads when all the other threads are busy until it reaches the "Minimum" number for each type of thread.

After this "Minimum" has been reached, the Thread Pool will throttle the rate at which it injects new threads and will only add or remove 1 thread per 500ms / 2 threads per second, or as a thread has completed work and becomes free, whatever comes first.

Through its "hill climbing technique algorithm", it is self-tuning and will stop adding threads and remove them if they are not actually helping improve throughput. The thread injection will continue while there is still work to be done until the "Maximum" number for each thread type has been reached.

As the number of requests is reduced, the threads in the Thread Pool start timing out waiting for new work (if an existing thread stays idle for 15 seconds), and will eventually retire themselves until the pool shrinks back to the minimum.

"Bursty" Web Traffic, Thread Starvation and 503 Service Unavailable

Let's say you have your Sitecore site running on an untuned, single Content Delivery server that has 8 processors with the default Thread Pool thread settings. For the sake of the simple example, let's assume we have an under-powered web service (perhaps used for looking up customer information from a backend CRM system) that under heavy load takes 5 seconds to provide a response to a request. Our developers have not implemented asynchronous programming in this example, and use the HttpWebRequest class.

We start out with 8 warmed up and ready worker and IOCP threads in our Thread Pool.

Now, lets say we have burst of 100 visitors accessing different pages (pages that consume the web service) on our site at the same time. The Thread Pool will quickly assign the 8 threads to handle the first 8 requests that will be busy for the next 5 seconds, while the other 92 sit in a queue. As you can see, it will take many 500ms intervals to catch up with the workload. IIS will wait some time for the threads to get free, so that the requests in queue can be processed. If any thread gets free in the waiting time, then it will be used to process the request. Otherwise IIS will return a 503 Service Unavailable error message. Both the slow web service and the untuned Thread Pool will result in some unhappy visitors seeing the 503 error message.

Looking at this a bit closer, a call to a web service uses one worker thread to execute the code that sends the request and one IOCP thread to receive the callback from the web service. In our case, the Thread Pool is completely saturated with work, and so the callback can never get executed because the items that were queued in the thread pool were blocked.

This problem is called Thread Pool Starvation - we have a "hungry" queue waiting to be served threads from the pool to perform some work, but none are available.

This example is a good reason for using asynchronous programming. With async programming, threads aren’t blocked while requests are being handled, so the threads would be freed up almost immediately.

Optimizing Thread Settings 

The ability to tune / manage thread settings has been available in the .NET framework for ages - since v1.1 actually.

Arguably, the most important settings are the minWorkerThreads and minIOThreads where you can specific the minimum number of threads that are available to your application's Thread Pool out of the gate (overriding the default formula's based on processor count as described above).

Threads that are controlled by these settings can be created at a much faster rate (because they are spawned from the Thread Pool), than worker threads that are created from the CLR's default "thread-tuning" capabilities - 1 thread per 500ms / 2 threads per second when all available threads in the pool are busy.

These and other important thread settings can be set in either your server's machine configuration file (in the \WINDOWS\Microsoft.Net\Framework\vXXXX\CONFIG directory) or with the Thread Pool API.

Beware: Out-of-Process Session State and Redis Client  

Out-of-Process Session State

If you are using Out-of-Process Session State in your Sitecore environment, you need to tune your Thread Pool!

Each of your Sitecore Content Delivery instances are individually configured to query expired sessions from your session store. This mechanism will add a ton of additional request overhead to your CD instances, and if your Thread Pools aren't tuned to handle this, you will find yourself in a Thread Starvation situation.

For more background on how and why this happens, please check out Ivan Sharamok's great post: http://blog.sharamok.com/2018-04-07/prepare-cd-for-experience-data-collection

Redis Client

If you are running your Sitecore environments on Microsoft Azure, you will be using Redis for session management. Sitecore makes use of the StackExchange.Redis client within the platform. Even though the client is built for high performance, it get's finicky if your Thread Pool threads are all busy, the "minimum" has been reached and thread injection slows down. You will start seeing Redis service request timeouts.

It is important for you to go through a Thread Pool tuning exercise to ensure that you don't run into Thread Starvation issues.

The nice thing is that the client prints Thread Pool statistics to your logs with details about worker and IOCP threads, to help you with your tuning exercise.

For more details, follow this Microsoft Redis FAQ link: https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-faq#important-details-about-threadpool-growth

Self-adjusting Thread Settings 

Lucky for us on Sitecore 9 and above, there is a pipeline processor that allows the application to adjust thread limits dynamically based on real-time thread availability (using the Thread Pool API).

By default, every 500 milliseconds, the processor will keep adding 50 to the minWorkerThreads setting via the Thread Pool API until it determines that the minimum number of threads is adequate based on available threads.

In my next post, I intend to explore this processor in detail and provide information on it's self-tuning abilities.

Thursday, May 16, 2019

Going to Production with Sitecore 9.1 on Azure PaaS: Critical Patches Required For Stability

Standard
After spending several months upgrading our custom solution to Sitecore 9.1, and launching on Azure PaaS, I have learned a lot about what it takes to eventually see the sunshine between those stormy clouds.

This is the first of a series of posts intended to help you and your team make the transition as smooth as possible.



Critical Patches

There are several patches and things that you will need to deploy that are imperative to your success on Azure PaaS.


High CPU - Excessive Thread Consumption

Sitecore traditional server roles (Content Management, Content Delivery etc) operate in a synchronous context while xConnect operations are asynchronous. Therefore, communication between your Sitecore traditional servers and xConnect are performed in a synchronous to asynchronous context.

This sync to async operation requires double the number of threads on the sync side in order to do the job.  This could result in there not being enough threads available to unblock the main thread.

Sitecore handled this excessive threading problem in their application code by building a custom thread scheduler. What this does is take advantage of a blocked thread to execute the operation, thus reducing the need for the additional thread, and making this synchronous to asynchronous context more efficient.

Great stuff right? Well, the problem that everyone will be faced with is that if you are not using an exact version of the System.Net.Http library, this thread scheduler simply doesn't work!

New versions of System.Net.Http don't respect the custom thread schedulers that Sitecore has built.

With the configurations that are shipped with Sitecore 9.x, the application uses the Global Assembly Cache to reference System.Net.Http, and 9 times out of 10, it will be a newer version of this library.

Without this thread scheduler working, you will end up with high CPU due to thread blocking, and your application will start failing to respond to incoming http requests.

In my case, I saw blocking appear in session end pipelines, and also in some calls on my Content Management server when working with EXM and contacts.

More detail about his issue, and the fix is described in this article: https://kb.sitecore.net/articles/327701

When you read the article, you would think that it doesn't apply to you because it is referring to .NET 4.7.2, and if you are working with Sitecore 9.x, the application ships using 4.7.1.

The truth is that it does! You need to perform the following actions in order to fix the threading problem:

1. Apply the binding redirect to your web.config to force Sitecore to use System.Net.Http version 4.2.0.0 mentioned in the article:


2. Deploy the System.Net.Http version 4.2.0.0 to the bin folder on all your traditional Sitecore instances.

NOTE: Make sure you remove any duplicate System.Net.Http binding redirect entries in your web.config, and that you only have the one described above.

Reference Data

First Issue

This first patch you need adds the ability to configure cache sizes and expiration time for the UserAgentDictionaryCache, ReferringSitesDictionary, and GeoIpDataDictionary, and the size for ReferenceDataClientDictionary cache. Without this patch, you will see high DTU (up to 100%) in your Reference Data database as there is a bug that allows the cache size to grow enormously, which leads to performance issues and shutdowns.

In order to fix the issue, you need to review the following KB article: https://kb.sitecore.net/articles/067230

In our 9.1 instance, I used the 9.0.1.2 version of the patch.

Second Issue

This first patch is not enough to fix your Reference Data woes. There is another set of Stored Procedure performance issues related to SQL when querying the Reference Data database. 

You will need to download and execute the following SQL scripts in order to fix this issue:
GetDefinitions.sql and SaveDefinitions.sql

Update 08/17/19
Sitecore provided an improved SQL script as the original scripts could lead to issues with the related operations in some scenarios (e.g. batch operations with GetDefinions returning only the first result).

Download the updated script here.

Redis Session Provider

First Issue

If you are on Azure PaaS, you will most definitely using Redis as your Out of Proc Session State Provider.

Patch 210408 is critical for the stability of session state in your environment https://kb.sitecore.net/articles/464570

This patch limits the number of worker threads per CPU core and also reserves threads so they can handle session end requests/threads with the least amount of delay as possible. Reading between the lines, this patch simply handles the Redis timeout issue more gracefully.

Without this, you will see session end events using all the threads and leaving no room to handle incoming http requests. After hanging for some time, they eventually end up with 502 error due to a timeout.

After applying the patch, the timeout settings referenced in this KB article will need to be made in both your web.config and Sitecore.Analytics.Tracking.config. You also want to update your pollingInterval to 60 seconds to reduce the stress on your Redis instance as well.

Note: Depending on how much traffic your site takes on, you may need to adjust the patch settings in order to free up more threads.

So for example, you can take the original settings, and add a multiplication factor of 3 or 4. As I mentioned before, this will be up to you to determine, based on your experienced load.

Example with multiplication factor of 3:


For my shared session tracker update, I created a patch file like the following:


Second Issue

Gabe Streza has a great post regarding the symptoms experienced when Redis instances powering your session state are under load: https://www.sitecoregabe.com/2019/02/redis-dead-redemption-redis-cache.html

It's important to read through his post, and also Sitecore's KB article: https://kb.sitecore.net/articles/858026

What both are basically saying is that you will need to create a new Redis instance in Azure, so that you can split your private sessions and shared sessions. So, to be clear, you will have one Redis Instance to handle private sessions and another to handle shared sessions.

I decided to keep my existing Redis instance to handle shared sessions, and used the new Redis instance to handle private sessions.

Similar to Gabe's steps, I created a new redis.sessions.private entry in the ConnectionString.config.

I then updated my Session State provider in my web.config to the following:

Final Thoughts 

These fixes have made a night and day difference on the stability of our high traffic 9.1 sites on Azure PaaS.

Feel free to reach out to me on Twitter or Sitecore Slack if you have any questions.

Monday, January 21, 2019

Improving the Sitecore Broken Links Removal Tool

Standard

Background

While working through an upgrade to Sitecore 9.1, I ran into a broken links issues that couldn't be resolved using Sitecore's standard Broken Links Removal tool.

While searching the internet, I was able to determine that I wasn't the only one that faced these types of issues.

In this post, I intend to walk you through the link problems that I ran into, and why I decided to create an updated Broken Links Removal tool to overcome the issues that the standard links removal tool wasn't able to resolve.

NOTE: The issues that I present in this post are not specific to version 9.1.  They exist in Sitecore versions going back to 8.x.


Exceptions after Upgrade Package Installation

After installing the 9.1 upgrade package and completing the post installation steps of rebuilding the links database and publishing, I discovered that lots of my site's pages started throwing the following exceptions:



The model item passed into the dictionary is of type 'Castle.Proxies.IGlassBaseProxy', but this dictionary requires a model item of type 'My Custom Model'.

During the solution upgrade, I had upgraded to Glass Mapper to version 5, so I thought that the issue could be related to this.  After digging in, I noticed that my items / pages that were throwing exceptions had broken links.  I determine this by turning on Broken Links using the Sitecore Gutter in the Content Editor.


Next, I attempted to run Broken Links Removal tool located at http://{your-sitecore-url}/sitecore/admin/RemoveBrokenLinks.aspx.

After it had run for several minutes, it threw the following exception:

ERROR Error looking up template field. Field id: {00000000-0000-0000-0000-000000000000}. Template id: {128ADD89-E6BC-4C54-82B4-A0915A56B0BD}
Exception: System.ArgumentException
Message: Null ids are not allowed.
Parameter name: fieldID
Source: Sitecore.Kernel
   at Sitecore.Diagnostics.Assert.ArgumentNotNullOrEmpty(ID argument, String argumentName)
   at Sitecore.Data.Templates.Template.DoGetField(ID fieldID, String fieldName, Stack`1 stack)
   at Sitecore.Data.Templates.Template.GetField(ID fieldID)

Digging In

I needed to understand why this exception was being thrown, and started down the path of decompiling Sitecore's assemblies.  My starting point for reviewing the code was Sitecore.sitecore.admin.RemoveBrokenLinks.cs which is the code behind for the Broken Links Removal page.

I took all the code and pasted it into my own ASPX page so that I could throw in a break point and debug what was going on.  After a lot of trial and error and a ton of logging,  I discovered that code that was throwing the error existed in the FixBrokenLinksInDatabase method on line 11 shown below:

If the Source Field ID / "itemLink.SourceFieldID" on line 11 is null (this is the field where it has determined that there is a broken link), the exception noted above will be thrown.

The Cause of the Null Source Field

During my investigation, I determined that the cause of this field being null was due to the item being created from a branch template that no longer existed.

To put this another way, the target item represented as the sourceItem in the code above (line 8), had a reference to a branch template that no longer existed, and the lookup for item was returning a null source field.

Through my code logging and Content Editor validation, I found that we had a massive amount of broken links caused by a developer deleting several EXM branch templates:



Stack Exchange and Sitecore Community uncovered some decent information regarding this type of issue, and how to solve it manually by running a SQL query:

https://community.sitecore.net/developers/f/8/t/1784

https://sitecore.stackexchange.com/questions/88/how-do-i-fix-a-broken-created-from-reference-when-the-branch-no-longer-exists/89

Now, to fix this problem automatically using the tool, I just needed to add a null check in the code, and also create a way to clean up the references to the invalid branch templates.

Improved Broken Links Tool

The outcome of my work was an improved Broken Links Removal tool that I call the "Broken Links Eraser".

The tool does everything that the Sitecore Broken Links Removal tool does, with the following improvements:

  • Detects and removes item references to branch templates that no longer exist.
  • Removes all invalid item field references to other items (inspects all fields that contain an id).
  • Allows you to target broken links using a target path, you don't have to run through every item in the target database. This is useful when working with large sets of content.
  • Has detailed logging while it is running and feedback after it has completed. 

The tool is built as a standalone ASPX page, so you can simply drop the file in your {webroot}/sitecore/admin folder to use it. No need to deploy assemblies and recycle app pools etc.


All updates were made using Sitecore's SqlDataApi, so the code is consistent with Sitecore's standards. The code is available on GitHub for you to download and modify as needed:



Final Thoughts

I hope that you find this tool useful in solving your broken link issues. Please feel free to add comments or contact me with any questions on either Sitecore Slack or Twitter.


Monday, December 3, 2018

Fix Email Campaign Pausing: Sitecore Email Experience Manager 3.x Retry Data Provider

Standard

Background

My company uses Email Experience Manager (EXM) to send several million emails a day, and we have been facing issues where our large campaigns would pause mid-send.

We have a scaled EXM environment with 2 dedicated dispatch servers, and a separate SQL Server, all with appropriate resources so the hardware was not an issue. We also ensured that databases were kept in tiptop condition (proper maintenance plans with stats being updated), and configurations where optimal for our environment.


The causing of the pausing

After digging in, I discovered that the pausing was caused by SQL deadlocks due to the massive amount of records and CRUD activity on the EXM SQL databases.

Sample Exception:

 ERROR Transaction (Process ID 116) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.  
 Exception: System.Data.SqlClient.SqlException  
 Message: Transaction (Process ID 116) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.  
 Source: .Net SqlClient Data Provider  
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)  
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)  
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)  
   at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)  
   at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)  
   at System.Data.SqlClient.SqlDataReader.Read()  
   at System.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)  
   at System.Data.SqlClient.SqlCommand.ExecuteScalar()  
   at Sitecore.Modules.EmailCampaign.Core.Data.SqlDbEcmDataProvider.CountRecipientsInDispatchQueue(Guid messageId, RecipientQueue[] queueStates)  
   at Sitecore.Modules.EmailCampaign.Core.Gateways.DefaultEcmDataGateway.CountRecipientsInDispatchQueue(Guid messageId, RecipientQueue[] queueStates)  
   at Sitecore.Modules.EmailCampaign.Core.Analytics.MessageStatistics.get_Unprocessed()  
   at Sitecore.Modules.EmailCampaign.Core.Analytics.MessageStatistics.get_Processed()  
   at Sitecore.Modules.EmailCampaign.Core.MessageStateInfo.InitializeSendingState()  
   at Sitecore.Modules.EmailCampaign.Core.MessageStateInfo.InitializeMessageStateInfo()  
   at Sitecore.Modules.EmailCampaign.Factory.GetMessageStateInfo(String messageItemId, String contextLanguage)  
   at Sitecore.EmailCampaign.Server.Services.MessageInfoService.Get(String messageId, String contextLanguage)  
   at Sitecore.EmailCampaign.Server.Controllers.MessageInfo.MessageInfoController.MessageInfo(MessageInfoContext data)  
   

How does this new data provider fix the problem?

The new data provider introduces efficient SQL deadlock handling. When a deadlock is detected, it will wait 5 seconds and then retry the transaction. The code will try to execute a deadlocked transaction 3 times.

Configuration

Defaults are set to wait 5 seconds for the retry, and the max retry attempts is 3. The DelaySeconds and RetryCount settings can be modified to suit your needs.

 <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">  
  <sitecore>  
   <ecmDataProvider defaultProvider="sqlretry">  
    <providers>  
     <clear/>  
     <add name="sqlretry" type="Sitecore.EmailCampaign.RetryDataProvider.RetrySqlDbEcmDataProvider, Sitecore.EmailCampaign.RetryDataProvider" connectionStringName="exm.master">  
      <Logger type="Sitecore.ExM.Framework.Diagnostics.Logger, Sitecore.ExM.Framework" factoryMethod="get_Instance"/>  
      <DelaySeconds>5</DelaySeconds>  
      <RetryCount>3</RetryCount>  
     </add>  
     <add name="sqlbase" type="Sitecore.Modules.EmailCampaign.Core.Data.SqlDbEcmDataProvider, Sitecore.EmailCampaign" connectionStringName="exm.master">  
      <Logger type="Sitecore.ExM.Framework.Diagnostics.Logger, Sitecore.ExM.Framework" factoryMethod="get_Instance"/>  
     </add>  
    </providers>  
   </ecmDataProvider>  
  </sitecore>  
 </configuration>  

Source Code and Documentation

Full source code, documentation and package download is available from my GitHub repository:

https://github.com/martinrayenglish/Sitecore-EXM-3.x-Retry-Data-Provider

Sunday, September 30, 2018

Sitecore Azure PaaS: Updating Your License File For All Deployed App Service Instances

Standard

Background

If you are provisioning a new set of Sitecore environments on your own, or if the Sitecore Managed Cloud Hosting Team provisions your environments for you, you will most likely be using a temporary license file that is valid for 1 month while you are waiting for your permanent license file .

When the temporary license expires, your Sitecore instances will stop working. Therefore, it is important that you upload a valid permanent license.xml file as soon as it is available.

File Locations

In an XP Scaled environment, there are many different App Services and locations where the license.xml file will need to be updated.

I created a list of the App Service roles and the license file locations for your reference:

App Service Role License File Location
xc-search  \App_data & \App_data\jobs\continuous\IndexWorker\App_data
ma-ops  \App_data & \App_data\jobs\continuous\AutomationEngine\App_Data
cd  \App_data
cm  \App_data
ma-rep  \App_data
prc  \App_data
rep  \App_data
xc-collect  \App_data
xc-refdata  \App_data

Updating the License File

The easiest way to update the file is to use the Debug console in the Kudu "Advanced Tools" in your App Service Instance, or an FTPS client to connect directly to the App's filesystem.