Background
In my previous post, I write about how you can extend the xConnect Data Adapter Provider to prevent unwanted data from being written to your shard databases.
But, what if you already have a lot of analytics and contact data in your shards, have an xDB index issue, and are struggling to rebuild your index?
Or, what if you want to keep the data in your databases just in case you may need it some time in the future but want to lighten the load on your xDB index to improve performance?
In this post, I will show you how to filter data from your xDB index to reduce the index size and growth rate. This will also improve your rebuild times tremendously.
You can also skip ahead and check out the full source code for my example on my GitHub repo: https://github.com/martinrayenglish/Test.SCExtensions
Business Scenario
In a 9.1 project, I was faced with a very slow rebuild times, where it was literally taking weeks to rebuild our xDB index. I was very concerned with the amount of data we were ingesting into the xDB index from the shard databases, specifically the rate at which interactions where being added. I have an older post this talks about this topic, so give it a read if you have not already.
Prior to 9.2, there is no true supported way to purge old interaction data, and so I was faced with the question of how I could reduce the amount of interaction data that our xDB index contained, and preventing our business from being paralyzed for weeks waiting for a rebuild to complete in the future.
Solving The Problem
As sheer quantity of interactions was the problem, I wanted to be able to filter both new and existing interactions from being brought in from the shard databases during a rebuild or after the session end data aggregation process.
Round 1
In my initial exploration, I focused on customizing the index writer code as that seemed to be the sweet spot. The code below shows an example of where I customized it to remove ALL interactions from making their way to the xDB index (see line 28 where I simply new up a new blank list of InteractionDataRecord objects):
To apply the updated writer, I put the new assembly into
{xconnect}\bin
and
{xconnect}\App_data\jobs\continuous\IndexWorker
and then updated the XML files to use new writer in {xconnect}\App_data\jobs\continuous\IndexWorker\App_data\Config\Sitecore\SearchIndexer\sc.Xdb.Collection.IndexWriter.Solr.xml
and
{xconnect}\App_data\config\sitecore\SearchIndexer\sc.Xdb.Collection.IndexWriter.Solr.xml
<IIndexWriter>
<Type>Test.SCExtensions.Xdb.Collection.Search.Solr, Test.SCExtensions</Type>
<As>Sitecore.Xdb.Collection.Indexing.IIndexWriter, Sitecore.Xdb.Collection</As>
<LifeTime>Singleton</LifeTime>
</IIndexWriter>
This worked well for new interactions coming in from the shard databases after session end, but did not work for rebuild operations so I needed to explore further.
Round 2
The problem that I discovered after my initial try, was that the SolrWriter type specified above is not used by the rebuilder because of the dependency injection implementation details. so writing the custom implementation of the rebuilder class was not a valid option.
I started looking into other options to override the related logic.
After investigating further, I discovered that a custom implementation of both the IndexRebuilderFilter and IndexWriterFilter would take care of both new interactions from sessions and existing interactions coming in via a rebuild operation.
I created 2 new classes to perform these tasks that I used to decorate the related interface.
Again, in these examples, I am removing ALL interactions from my xDB index. In a real world scenario, it would be best to filter based on analytics data that is valuable to your business so that it will be available via the xDB search API.
I show you how to do this further down in my post: More Bang - Selective xDB Index Filtering.
Next, I created decorator XML configs to allow xConnect to use my new writer extension code.
Rebuilder writer config location: {xconnect}\App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\sc.Xdb.Collection.Indexing.IndexRebuilderFilterDecorator.xml
<?xml version="1.0" encoding="utf-8"?>
<Settings>
<Sitecore>
<XConnect>
<SearchIndexer>
<Services>
<IIndexRebuilderFilterDecorator>
<Type>Sitecore.XConnect.Configuration.ServiceDecorator`2[[Sitecore.Xdb.Collection.Indexing.IIndexRebuilder, Sitecore.Xdb.Collection], [Test.SCExtensions.Xdb.Collection.Search.Solr.IndexRebuilderFilterDecorator, Test.SCExtensions]], Sitecore.XConnect.Configuration</Type>
<LifeTime>Singleton</LifeTime>
</IIndexRebuilderFilterDecorator>
</Services>
</SearchIndexer>
</XConnect>
</Sitecore>
</Settings>
Writer config location: {xconnect}\App_Data\jobs\continuous\IndexWorker\App_data\config\sitecore\SearchIndexer\sc.Xdb.Collection.Indexing.IndexWriterFilterDecorator.xml
<?xml version="1.0" encoding="utf-8"?>
<Settings>
<Sitecore>
<XConnect>
<SearchIndexer>
<Services>
<IIndexWriterFilterDecorator>
<Type>Sitecore.XConnect.Configuration.ServiceDecorator`2[[Sitecore.Xdb.Collection.Indexing.IIndexWriter, Sitecore.Xdb.Collection], [Test.SCExtensions.Xdb.Collection.Search.Solr.IndexWriterFilterDecorator, Test.SCExtensions]], Sitecore.XConnect.Configuration</Type>
<LifeTime>Singleton</LifeTime>
</IIndexWriterFilterDecorator>
</Services>
</SearchIndexer>
</XConnect>
</Sitecore>
</Settings>
After applying these updates, I was able to successful filter interactions coming from the shard databases to my xDB index during both rebuilds and session ends!
More Bang - Selective xDB Index Filtering
As you don't necessarily want to remove all interactions from the xDB index unless you are in dire straights, I added a new class for filtering interactions based on specific events. In other words, if an interaction has an event in it that we care about, we allow it to go into the index.
By using this example, you can lighten your index, and still include interactions that are important to you for personalization etc.