Sunday, November 26, 2023

Intermittent publishing issue after upgrading from Sitecore Publishing Service 5.x to 7.x

Standard

Background

Over the past few months, we've been diligently working on upgrading our platform from version 10.1.1 to 10.3.1. If you are using Publishing Service, you are required to upgrade to upgrade it to version 7.x per Sitecore's compatibility article: https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB0761308 

The Problem

We followed the upgrade steps very carefully, however we noticed that if we did a big related item publish (home related and children) the following error was thrown, and the publish would fail:


[Error] Error in the "VariantsRelatedNodesTargetProducer"
System.InvalidOperationException: The ConnectionString property has not been initialized.
at System.Data.SqlClient.SqlConnection.PermissionDemand()
at System.Data.SqlClient.SqlConnectionFactory.PermissionDemand(DbConnection outerConnection)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.OpenAsync(CancellationToken cancellationToken)
--- End of stack trace from previous location ---
at Sitecore.Framework.TransientFaultHandling.Sql.SqlRetryHelper.<>c__DisplayClass8_0`1.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Sitecore.Framework.TransientFaultHandling.Sql.SqlRetryHelper.ExecuteAsync[T](DbConnection connection, Func`1 sqlWork, Func`3 commandRetryPolicy, CancellationToken cancellationToken)
at Sitecore.Framework.Publishing.Data.AdoNet.DatabaseConnection`1.ExecuteAsync[T](Func`2 dbWork)
at Sitecore.Framework.Publishing.Data.Classic.ClassicItemRelationshipRepository.GetAllRelationships(String source, IReadOnlyCollection`1 uris, IReadOnlyCollection`1 outFieldIdsWhitelist, IReadOnlyCollection`1 inFieldIdsWhitelist, Predicate`1 outRelPostFilter, Predicate`1 inRelPostFilter)
at Sitecore.Framework.Publishing.ManifestCalculation.PublishCandidateSource.GetRelatedNodes(IReadOnlyCollection`1 locators, Boolean includeRelatedContent, Boolean includeClones)
at Sitecore.Framework.Publishing.ManifestCalculation.VariantsRelatedNodesTargetProducer.ProcessCandidatesBatch(IList`1 locators)
The failure with error would only happen during large publishes of related items. Smaller page level publishes were perfectly fine.

After digging deeper into the logs, we noticed lot of OutOfMemory exceptions. All System.OutOfMemoryException instances were occuring when either creating a dictionary collection with certain capacity or when resizing dictionary collection to add more elements:

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Collections.Generic.Dictionary`2.Initialize(Int32 capacity)
   at System.Collections.Generic.Dictionary`2..ctor(Int32 capacity, IEqualityComparer`1 comparer)

--------------------------------------------

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.

   at System.Collections.Generic.Dictionary`2.Resize(Int32 newSize, Boolean forceNewHashCodes)

   at System.Collections.Generic.Dictionary`2.Resize()

   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)

The Fix

When Publishing Service runs as a 32-bit application a large publish will produce an OutOfMemory exception when populating a dictionary as it reaches virtual memory limit very early. 

Even though the App Service in the Azure portal was set to 64 bit, we had to modify the Publishing Service web.config and set the "processPath" setting to "D:\Program Files\dotnet\dotnet.exe" instead of the "dotnet" environment variable.

After making this change, an App Service restart is required.

That's it! Back in business with those big publishes!

Wednesday, November 8, 2023

Missing language versions of items after database cleanup during upgrade

Standard

Background

Over the past few months, we've been diligently working on upgrading our system from version 10.1.1 to 10.3.1. However, during this process, a significant issue came to our attention. It became evident that certain language versions had been inadvertently removed from some of our items.

To clarify, the database prior to the upgrade did indeed contain these language versions, but after the upgrade, they were conspicuously absent. The specific item in question, which prompted this issue, lacked language versions in Danish (da-dk).

When examining the user interface (UI), we observed that the items were missing the expected language versions. Additionally, a verification revealed that these language versions were missing from the VersionedFields table in the master database.

We meticulously followed the instructions outlined in the upgrade guide and identified two potentially problematic actions within it:

1. The Sitecore.UpdateApp tool
2. The clean-up operation for databases in the control panel

This led us to conclude that one of these actions must have been the culprit behind the issue.







Root Cause

After some investigation, I pinpointed the root cause as the database clean-up operation, which unintentionally removed the item versions.

Going deeper into the issue, I noticed a disparity in the letter casing. As indicated in the screenshot, the regional code and name for Danish language appeared as "da-dk" with "dk" in lowercase, whereas all other languages were in uppercase after the dash. I also observed that in the database, the removed item version had the language set to "da-DK."




The problem arose because the language name was set as "da-dk," but actually created entries in the VersionedFields table with "da-DK." Subsequently, when running the database clean-up, it mass-deleted all entries in the VersionedFields table that did not match "da-dk." This behavior is inherent to the system and beyond user control.

The issue is compounded by the absence of validation when creating a new language. It's remarkably easy for someone to input their preference without adhering to the proper letter casing. A simple change to lowercase can inadvertently lead to this significant problem.


Conclusion

As of now, this issue is marked as a bug, and we anticipate that it will be addressed in a future release. Until a fix is available, it's essential to remain vigilant for this particular issue.

In the meantime, you can use a SQL update command like the one below to rectify your language setting:

UPDATE [XP0.103d_Master].[dbo].[Items]
SET [Name] = 'da-DK'
WHERE [ID] LIKE '%B8DDF4A0-0DDD-4BC7-B6D9-FCA8C38FB740%'

Please note that this SQL command can be used as a temporary solution until the bug is officially resolved in a subsequent release.



Saturday, September 30, 2023

Understanding Sitecore's Self-Adjusting Thread Pool Size Monitor

Standard

Background

I must admit, I have become mildly obsessed with the threading over the last couple years, mostly because a great deal of my work has involved stabilization and optimization practices on high-traffic Sitecore sites. 

In this post, I want to focus in the Thread Pool Size Monitor that comes baked into Sitecore from 9 onwards, because it is not widely known that it exists, the job it does, and how it can be tuned to optimize performance.

Thread Pool Size Monitor

To recap, the most important thread configuration settings are the minWorkerThreads and minIOThreads where you can specify the minimum number of threads that are available to your application's Thread Pool instead of relying on the default formula's based on processor count which is always too few.

Threads that are controlled by these values can be created at a much faster rate (because they are spawned from the Thread Pool), than worker threads that are created from the CLR's default "thread-tuning" capabilities. 

To summarize: 

  • Thread pool threads get thrown in faster to handle work. 
  • The CLR thread spin up algorithm is too slow and we can't rely on it to support high performance applications.

As previously mentioned, in Sitecore 9 and above, there is a pipeline processor that allows the application to adjust thread limits dynamically based on real-time thread availability (using the Thread Pool API).

This can be found in the following namespace: Sitecore.Analytics.ThreadPoolSizeMonitor.

By default, every 500 milliseconds, the processor will keep adding a value of 50 to the minWorkerThreads setting via the Thread Pool API until it determines that the minimum number of threads is adequate based on available threads.

How It Works

Since a picture is worth a thousand words, I put together a diagram of how the logic of the Thread Pool Size Monitor logic works, and provided an example with the default settings that are set on an Azure P3v2 App Service that has 4 cores.  




Custom Thread Pool Size Monitor Configuration

An enhancement that I have made on my past PaaS implementation was to tune Sitecore’s dynamic thread processor using a more “aggressive” configuration. This helped me with those “bursty” web traffic situations where I needed to be sure that I had enough threads available to serve the current demands. 

Here is the configuration that I used:

Wednesday, July 19, 2023

Sitecore Publishing Service - Using Sitecore PowerShell Extensions To Move Publishing Jobs To The Top Of The Queue

Standard

Background

In my previous post, I provided a way to get a job queue report using PowerShell Extensions (SPE). In this post, I am going to show how you can use the output from the report to promote publishing jobs to the top of the queue using SPE.

Large Publishing Queue

You may ask, well why? Sitecore's Publishing Service is a great improvement over the out-of-the-box publishing mechanism, and is pretty fast at publishing items.

That is indeed true, however when working with extremely large sites with several hundred content authors and multiple publishing targets, the queue can become extremely long. I have seen it grow to upwards of several thousands items, and publishing taking several hours.

This is problematic if you have something urgent that needs to be published, as the job could be sitting in the queue for hours!

The Solution

As you saw in my last post, it is pretty simple to access the SQL publishing queue database table using SPE. As the operations of a queue make it a first-in-first-out (FIFO) data structure based on the "Queued" datetime field,  I discovered that simply updating target job's datetime field to a smaller value, would instantly move the job higher in the queue.


So, my final logic was this:

  • Get smallest Queued datetime of the jobs sitting in the queue that still needed to be published
  • Subtract 2 minutes from the value
  • Update the queued datetime of job that I want to promote to the top of the queue with this new smaller datetime value
  • Done! My job was popped to the top!

Now, this is the perfect pairing with the job queue report from my previous post. You can use the report to find the job and its id that you want to promote, and then use the id to run the script to promote the job!

I recommend following this guide to convert this into your own SPE custom module for your solution: Modules - Sitecore PowerShell Extensions

 




I hope you find this script another useful add to your PowerShell toolbox.

Monday, March 20, 2023

Sitecore Publishing Service - Publishing Job Queue Report Using Sitecore PowerShell Extensions

Standard

Background

Sitecore's Publishing Service only allows you to see a maximum of 10 items at a time within the Queued or Recent jobs reports within the dashboard.

This is not ideal if you need to see how many total items are in the queue, need to get an estimate of how long it will take to get your publish live or quite simply need to do any type of analysis or troubleshooting.

Usually, you will have to talk to a DevOps person who has access to your Sitecore Master database, and get them to write a somewhat complex SQL query against your Publishing_JobQueue table to get you the information that you need.

It is a bit complex due to the fact that most of the key information is stored in an XML field called "Options" within this particular table.


PowerShell For The Win

After spending a bit of time formulating a decent SQL query that would get the key information that we were after, I decided to take it one step further by incorporating it into a PowerShell script that could be generated on demand from within the Sitecore console, and also output a searchable and downloadable report.

A clear win for our Authoring Admins and DevOps teams!

I hope you find this script a useful add to your PowerShell toolbox.



Monday, March 13, 2023

Sitecore Content Hub - Set up SAML-based SSO in Azure AD using an App Registration

Standard

Background

In this post, I will show you how to create and configure an Azure Application Registration in your tenant to allow Sitecore Content Hub users to successfully authenticate against your Azure Active Directory.

Options

The Content Hub team's preferred set up option is to create an Enterprise application within your Azure AD, but unfortunately for us, our DevOps would not allow this due to very strict security constraints that we had to abide by. This is the main reason that we had to go the App Registration route.

We initially tried to get the App Registration working using Microsoft Provider SSO, but could not get the proper Group claims working correctly.

As a result, we focused on configuring SAML Auth within our App Registration, and were able to get all the claims needed to successfully get SSO authentication working with this approach.

Set up within Azure

Within your Azure Portal, find App registrations and click on the New Registration button. Give it a name, and leave the default options selected, and click Register.


Within the newly created registration, go to the Authentication menu option within the Manage section.

Click "Add a platform", and then select "Web".


Set your Redirect URIs to be the Content Hub portal url. You will be able to add additional URIs after the initial set up. For now, I will use a default one.

Make sure you check the Access tokens and ID tokens boxes within the Implicit grant and hybrid flows section.




After this, click the Configure button.

Next, go to the Token configuration menu option within the Manage section.

Click Add group claim, and check the Security group box. Confirm that the Group ID radio option is selected within the ID, Access and SAML options.

Click the Add button.


Next, click Add optional claim.

Within Token type, select SAML, and check the email Claim box. Click Add.


When prompted, check the "Turn on the Microsoft Graph email permission" box to allow the claims to appear in the token. Click Add.


Next, go to the Expose an API menu option within the Manage section. Click Add a scope, and it will generate an Application ID URI for you. 

Make note of this, as you will need it for the Content Hub side.

Click Save and continue.


After is has been created, you can click the Cancel button.




Go to the Overview menu option, and click Endpoints. Go to the Federation metadata document XML url and make note of it

Then, copy and past it into a new browser tab.





Make note of the entityID.

Your set of notes should look similar to this:


Set up within Content Hub

Log into your Content Hub portal. Click on Manage, and then go to Settings.



Within Settings, go to PortalConfiguration, and select the Authentication menu option. Change the view to Text as it's easier to work with.

Within the ExternalAuthenticationProviders, saml XML config, set the key values to what you saved in your notes. Make sure you set the provider_name and add some basic messages.



Example:
 ExternalAuthenticationProviders": {  
   "global_username_claim_type": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name",  
   "global_email_claim_type": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress",  
   "google": [],  
   "Microsoft": [],  
   "saml": [  
    {  
     "metadata_location": "https://login.microsoftonline.com/8ac76c91-e7f1-41ff-a89c-3553b2da2c17/federationmetadata/2007-06/federationmetadata.xml",  
     "sp_entity_id": "api://c8696890-1d5f-479b-9df1-154e8f315165",  
     "idp_entity_id": "https://sts.windows.net/8ac76c91-e7f1-41ff-a89c-3553b2da2c17/",  
     "password": null,  
     "certificate": null,  
     "binding": "HttpRedirect",  
     "authn_request_protocol_binding": null,  
     "is_enabled": true,  
     "provider_name": "martinSamlNewLocal",  
     "messages": {  
      "signIn": "Martin SAML SSO Test"  
     },  
     "authentication_mode": "Passive"  
    }  
   ],  
   "sitecore": [],  
   "ws_federation": [],  
   "yandex": []  
  }  

Click Save, and you are done!

You are now ready to test out your authentication using your shiny, new authentication button.

Users with more than 200 groups

We found a limitation with SSO authentication group claims in Azure AD https://docs.microsoft.com/en-us/azure/active-directory/hybrid/how-to-connect-fed-group-claims wherein if there are more than 200 groups associated to a user, then the SSO authentication will provide a graph link instead of passing in the group claims. 

There is currently no solution for this problem. We are handling these handful of users via manual security set up.

Sunday, January 23, 2022

Sitecore Cleanup Monitor - Proactively keeping an eye on your Event Queue, History and Publish Queue tables

Standard

Background

There are several horror stories floating around the web about the Event Queue bringing Sitecore down to its knees.

Brian Pedersen
https://briancaos.wordpress.com/2016/08/12/sitecore-event-queue-how-to-clean-it-and-why/
https://briancaos.wordpress.com/2014/10/23/sitecore-eventqueue-deadlocks-how-to-solve-them-and-how-to-avoid-them/

Andy Cohen
https://blog.horizontalintegration.com/2016/02/09/sitecore-eventqueue-strikes-again/

I have experienced trouble myself
http://sitecoreart.martinrayenglish.com/2016/08/diagnosing-content-management-server.html

The Last Straw 

There is a bug in pre 8.1 U3 releases (I am on 8.1 U2) that will cause the Event Queue table in the Core database to be flooded with timestamp data from your Sitecore servers in a scaled environment.

The issue was related to the property:changed event that was being added into the Event Queue. Every 10 seconds each Sitecore Instance would use the SetTimestampForLastProcessing method.

There was no need to inform other instances about the update in last processed stamp of local instance, and Sitecore Support provided me with a patch where they simply used the event disabler to fix the issue.

Here is a copy of the patch for download if you are having this problem: https://www.dropbox.com/s/lpjhil5rf9dri0n/Sitecore.Support.99697.zip?dl=0

After experiencing this and other problems in the past, I decided to take action.

Sitecore Cleanup Monitor Module 

The Event Queue was my initial focus, but per Sitecore's Performance Tuning Guide, in order to keep Sitecore running optimally, we need to keep the Event Queue, History and Publish Queue tables below 1000 rows: https://sdn.sitecore.net/upload/sitecore7/70/cms_tuning_guide_sc70-72-a4.pdf. The reason behind this is due to SQL deadlocking: https://technet.microsoft.com/en-us/library/ms177433(v=sql.105).aspx.

With all this being said, I decided to put together a module that would keep an eye on these key tables.

The module consists of 3 agents that will monitor the Event Queue, Publish Queue and History tables to ensure that they don't exceed a set threshold.



Why would you use it?

In many cases, Sitecore's default cleanup agents just aren't efficient enough in cleaning up these key Sitecore tables.

This module allows you to be proactive instead of reactive, so that you don't have to log into your SQL instance to manually run queries to clean up your tables, usually after the $#!,$h has hit the fan.

How does it work? 

When due, the agent will check the row count of the target table in each database (core, master and web), and if the count is above the set threshold, it will remove the oldest rows, bringing the row count down to the threshold. It won't do anything to tables with row counts that are below the threshold.

You can set how often you want each agent to run, and what you want your threshold / table row count to be. You also don't need to use all three agents. If you only want to monitor the Event Queue for example, simply comment or remove the other agents from the module's config file.

You can monitor it's activity be examining your Sitecore logs. Here is a snapshot example:


Installation and Configuration

Documentation, full source code and package download is available from my GitHub repository: https://github.com/martinrayenglish/Sitecore.Cleanup

The module is available on the Sitecore Marketplace: https://marketplace.sitecore.net/Modules/S/Sitecore_Cleanup_Monitor.aspx