Thursday, May 16, 2019

Going to Production with Sitecore 9.1 on Azure PaaS: Critical Patches Required For Stability

Standard
After spending several months upgrading our custom solution to Sitecore 9.1, and launching on Azure PaaS, I have learned a lot about what it takes to eventually see the sunshine between those stormy clouds.

This is the first of a series of posts intended to help you and your team make the transition as smooth as possible.



Critical Patches

There are several patches and things that you will need to deploy that are imperative to your success on Azure PaaS.


High CPU - Excessive Thread Consumption

Sitecore traditional server roles (Content Management, Content Delivery etc) operate in a synchronous context while xConnect operations are asynchronous. Therefore, communication between your Sitecore traditional servers and xConnect are performed in a synchronous to asynchronous context.

This sync to async operation requires double the number of threads on the sync side in order to do the job.  This could result in there not being enough threads available to unblock the main thread.

Sitecore handled this excessive threading problem in their application code by building a custom thread scheduler. What this does is take advantage of a blocked thread to execute the operation, thus reducing the need for the additional thread, and making this synchronous to asynchronous context more efficient.

Great stuff right? Well, the problem that everyone will be faced with is that if you are not using an exact version of the System.Net.Http library, this thread scheduler simply doesn't work!

New versions of System.Net.Http don't respect the custom thread schedulers that Sitecore has built.

With the configurations that are shipped with Sitecore 9.x, the application uses the Global Assembly Cache to reference System.Net.Http, and 9 times out of 10, it will be a newer version of this library.

Without this thread scheduler working, you will end up with high CPU due to thread blocking, and your application will start failing to respond to incoming http requests.

In my case, I saw blocking appear in session end pipelines, and also in some calls on my Content Management server when working with EXM and contacts.

More detail about his issue, and the fix is described in this article: https://kb.sitecore.net/articles/327701

When you read the article, you would think that it doesn't apply to you because it is referring to .NET 4.7.2, and if you are working with Sitecore 9.x, the application ships using 4.7.1.

The truth is that it does! You need to perform the following actions in order to fix the threading problem:

1. Apply the binding redirect to your web.config to force Sitecore to use System.Net.Http version 4.2.0.0 mentioned in the article:


2. Deploy the System.Net.Http version 4.2.0.0 to the bin folder on all your traditional Sitecore instances.

NOTE: Make sure you remove any duplicate System.Net.Http binding redirect entries in your web.config, and that you only have the one described above.

Reference Data

First Issue

This first patch you need adds the ability to configure cache sizes and expiration time for the UserAgentDictionaryCache, ReferringSitesDictionary, and GeoIpDataDictionary, and the size for ReferenceDataClientDictionary cache. Without this patch, you will see high DTU (up to 100%) in your Reference Data database as there is a bug that allows the cache size to grow enormously, which leads to performance issues and shutdowns.

In order to fix the issue, you need to review the following KB article: https://kb.sitecore.net/articles/067230

In our 9.1 instance, I used the 9.0.1.2 version of the patch.

Second Issue

This first patch is not enough to fix your Reference Data woes. There is another set of Stored Procedure performance issues related to SQL when querying the Reference Data database. 

You will need to download and execute the following SQL scripts in order to fix this issue:

Redis Session Provider

First Issue

If you are on Azure PaaS, you will most definitely using Redis as your Out of Proc Session State Provider.

Patch 210408 is critical for the stability of session state in your environment https://kb.sitecore.net/articles/464570

This patch limits the number of worker threads per CPU core and also reserves threads so they can handle session end requests/threads with the least amount of delay as possible. Reading between the lines, this patch simply handles the Redis timeout issue more gracefully.

Without this, you will see session end events using all the threads and leaving no room to handle incoming http requests. After hanging for some time, they eventually end up with 502 error due to a timeout.

After applying the patch, the timeout settings referenced in this KB article will need to be made in both your web.config and Sitecore.Analytics.Tracking.config. You also want to update your pollingInterval to 60 seconds to reduce the stress on your Redis instance as well.

Note: Depending on how much traffic your site takes on, you may need to adjust the patch settings in order to free up more threads.

So for example, you can take the original settings, and add a multiplication factor of 3 or 4. As I mentioned before, this will be up to you to determine, based on your experienced load.

Example with multiplication factor of 3:


For my shared session tracker update, I created a patch file like the following:


Second Issue

Gabe Streza has a great post regarding the symptoms experienced when Redis instances powering your session state are under load: https://www.sitecoregabe.com/2019/02/redis-dead-redemption-redis-cache.html

It's important to read through his post, and also Sitecore's KB article: https://kb.sitecore.net/articles/858026

What both are basically saying is that you will need to create a new Redis instance in Azure, so that you can split your private sessions and shared sessions. So, to be clear, you will have one Redis Instance to handle private sessions and another to handle shared sessions.

I decided to keep my existing Redis instance to handle shared sessions, and used the new Redis instance to handle private sessions.

Similar to Gabe's steps, I created a new redis.sessions.private entry in the ConnectionString.config.

I then updated my Session State provider in my web.config to the following:

Final Thoughts 

These fixes have made a night and day difference on the stability of our high traffic 9.1 sites on Azure PaaS.

Feel free to reach out to me on Twitter or Sitecore Slack if you have any questions.

Monday, January 21, 2019

Improving the Sitecore Broken Links Removal Tool

Standard

Background

While working through an upgrade to Sitecore 9.1, I ran into a broken links issues that couldn't be resolved using Sitecore's standard Broken Links Removal tool.

While searching the internet, I was able to determine that I wasn't the only one that faced these types of issues.

In this post, I intend to walk you through the link problems that I ran into, and why I decided to create an updated Broken Links Removal tool to overcome the issues that the standard links removal tool wasn't able to resolve.

NOTE: The issues that I present in this post are not specific to version 9.1.  They exist in Sitecore versions going back to 8.x.


Exceptions after Upgrade Package Installation

After installing the 9.1 upgrade package and completing the post installation steps of rebuilding the links database and publishing, I discovered that lots of my site's pages started throwing the following exceptions:



The model item passed into the dictionary is of type 'Castle.Proxies.IGlassBaseProxy', but this dictionary requires a model item of type 'My Custom Model'.

During the solution upgrade, I had upgraded to Glass Mapper to version 5, so I thought that the issue could be related to this.  After digging in, I noticed that my items / pages that were throwing exceptions had broken links.  I determine this by turning on Broken Links using the Sitecore Gutter in the Content Editor.


Next, I attempted to run Broken Links Removal tool located at http://{your-sitecore-url}/sitecore/admin/RemoveBrokenLinks.aspx.

After it had run for several minutes, it threw the following exception:

ERROR Error looking up template field. Field id: {00000000-0000-0000-0000-000000000000}. Template id: {128ADD89-E6BC-4C54-82B4-A0915A56B0BD}
Exception: System.ArgumentException
Message: Null ids are not allowed.
Parameter name: fieldID
Source: Sitecore.Kernel
   at Sitecore.Diagnostics.Assert.ArgumentNotNullOrEmpty(ID argument, String argumentName)
   at Sitecore.Data.Templates.Template.DoGetField(ID fieldID, String fieldName, Stack`1 stack)
   at Sitecore.Data.Templates.Template.GetField(ID fieldID)

Digging In

I needed to understand why this exception was being thrown, and started down the path of decompiling Sitecore's assemblies.  My starting point for reviewing the code was Sitecore.sitecore.admin.RemoveBrokenLinks.cs which is the code behind for the Broken Links Removal page.

I took all the code and pasted it into my own ASPX page so that I could throw in a break point and debug what was going on.  After a lot of trial and error and a ton of logging,  I discovered that code that was throwing the error existed in the FixBrokenLinksInDatabase method on line 11 shown below:

If the Source Field ID / "itemLink.SourceFieldID" on line 11 is null (this is the field where it has determined that there is a broken link), the exception noted above will be thrown.

The Cause of the Null Source Field

During my investigation, I determined that the cause of this field being null was due to the item being created from a branch template that no longer existed.

To put this another way, the target item represented as the sourceItem in the code above (line 8), had a reference to a branch template that no longer existed, and the lookup for item was returning a null source field.

Through my code logging and Content Editor validation, I found that we had a massive amount of broken links caused by a developer deleting several EXM branch templates:



Stack Exchange and Sitecore Community uncovered some decent information regarding this type of issue, and how to solve it manually by running a SQL query:

https://community.sitecore.net/developers/f/8/t/1784

https://sitecore.stackexchange.com/questions/88/how-do-i-fix-a-broken-created-from-reference-when-the-branch-no-longer-exists/89

Now, to fix this problem automatically using the tool, I just needed to add a null check in the code, and also create a way to clean up the references to the invalid branch templates.

Improved Broken Links Tool

The outcome of my work was an improved Broken Links Removal tool that I call the "Broken Links Eraser".

The tool does everything that the Sitecore Broken Links Removal tool does, with the following improvements:

  • Detects and removes item references to branch templates that no longer exist.
  • Removes all invalid item field references to other items (inspects all fields that contain an id).
  • Allows you to target broken links using a target path, you don't have to run through every item in the target database. This is useful when working with large sets of content.
  • Has detailed logging while it is running and feedback after it has completed. 

The tool is built as a standalone ASPX page, so you can simply drop the file in your {webroot}/sitecore/admin folder to use it. No need to deploy assemblies and recycle app pools etc.


All updates were made using Sitecore's SqlDataApi, so the code is consistent with Sitecore's standards. The code is available on GitHub for you to download and modify as needed:



Final Thoughts

I hope that you find this tool useful in solving your broken link issues. Please feel free to add comments or contact me with any questions on either Sitecore Slack or Twitter.