Friday, October 30, 2015

Fix for Multiple Versions of Items being indexed from the Master Database

Standard

Background

This reared it's head on a project a couple months back - we were finding multiple versions of items that where being indexed from the Master Database.

After looking around on the web for a solution, I came across this post about inbound and outbound filter pipelines by the Sitecore Development Team: http://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/04/sitecore-7-inbound-and-outbound-filter-pipelines.aspx

The Disaster Waiting to Happen 

As Owen Wattley noted in his post, implementing the ApplyInboundIndexVersionFilter to ensure only the latest version goes into the index can cause problems that aren't apparent at first glance.

"...The problem my team found is as follows:

  1. Create an item, version 1 goes into the index because it's the latest version 
  2. Add a new version. Version 2 goes into the index because it's now the latest version. 
  3. Version 1 gets blocked by the inbound filter, meaning the index entry for version 1 DOESN'T GET UPDATED OR REMOVED. In the index it is still marked as the latest version. So is version 2. This means you have 2 versions in your index, both marked as the latest version. 



You have to be very careful with inbound filters because they don't do as you might expect. I expected that if you set "args.IsExcluded" to true then it would REMOVE that entry from the index, but it doesn't - it ONLY ensures that nothing gets ADDED. That's a subtle but very crucial difference. Once we found this problem we quickly removed the inbound latest version filter. "

The Solution

Luckily, Sitecore star Pavel Veller recommended a solution that would help alleviate these issues. I just took his idea and implemented the solution.

As this keeps popping up time and time again in the Sitecore 8.x projects that I have been working on, I wanted to share this implementation with the community.

Hope it helps!

FilterPatchItemCrawler.cs


1:  using System;  
2:  using System.Collections.Generic;  
3:    
4:  using Sitecore.ContentSearch;  
5:  using Sitecore.ContentSearch.Abstractions;  
6:  using Sitecore.ContentSearch.Diagnostics;  
7:  using Sitecore.Data.Items;  
8:  using Sitecore.Diagnostics;  
9:  using Sitecore.Globalization;  
10:    
11:  namespace FilterPatch.Library.ContentSearch  
12:  {  
13:    public class FilterPatchItemCrawler : SitecoreItemCrawler  
14:    {  
15:      protected override void DoAdd(IProviderUpdateContext context, SitecoreIndexableItem indexable)  
16:      {  
17:        Assert.ArgumentNotNull((object)context, "context");  
18:        Assert.ArgumentNotNull((object)indexable, "indexable");  
19:    
20:        this.Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:adding", (object)context.Index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
21:        if (this.IsExcludedFromIndex(indexable, false))  
22:          return;  
23:        foreach (Language language in indexable.Item.Languages)  
24:        {  
25:          Item obj1;  
26:          using (new FilterPatchCachesDisabler())  
27:            obj1 = indexable.Item.Database.GetItem(indexable.Item.ID, language, Sitecore.Data.Version.Latest);  
28:          if (obj1 == null)  
29:          {  
30:            CrawlingLog.Log.Warn(string.Format("FilterPatchItemCrawler : AddItem : Could not build document data {0} - Latest version could not be found. Skipping.", (object)indexable.Item.Uri), (Exception)null);  
31:          }  
32:          else  
33:          {  
34:            using (new FilterPatchCachesDisabler())  
35:            {  
36:              SitecoreIndexableItem sitecoreIndexableItem = obj1.Versions.GetLatestVersion();  
37:              IIndexableBuiltinFields indexableBuiltinFields = sitecoreIndexableItem;  
38:              indexableBuiltinFields.IsLatestVersion = indexableBuiltinFields.Version == obj1.Version.Number;  
39:              sitecoreIndexableItem.IndexFieldStorageValueFormatter = context.Index.Configuration.IndexFieldStorageValueFormatter;  
40:    
41:              this.Operations.Add(sitecoreIndexableItem, context, this.index.Configuration);  
42:            }    
43:          }  
44:        }  
45:        this.Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:added", (object)context.Index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
46:      }  
47:    
48:      protected override void DoUpdate(IProviderUpdateContext context, SitecoreIndexableItem indexable)  
49:      {  
50:        Assert.ArgumentNotNull((object)context, "context");  
51:        Assert.ArgumentNotNull((object)indexable, "indexable");  
52:        if (this.IndexUpdateNeedDelete(indexable))  
53:        {  
54:          this.Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:deleteitem", (object)this.index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
55:          this.Operations.Delete((IIndexable)indexable, context);  
56:        }  
57:        else  
58:        {  
59:          this.Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:updatingitem", (object)this.index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
60:          if (!this.IsExcludedFromIndex(indexable, true))  
61:          {  
62:            foreach (Language language in indexable.Item.Languages)  
63:            {  
64:              Item obj1;  
65:              using (new FilterPatchCachesDisabler())  
66:                obj1 = indexable.Item.Database.GetItem(indexable.Item.ID, language, Sitecore.Data.Version.Latest);  
67:              if (obj1 == null)  
68:              {  
69:                CrawlingLog.Log.Warn(string.Format("FilterPatchItemCrawler : Update : Latest version not found for item {0}. Skipping.", (object)indexable.Item.Uri), (Exception)null);  
70:              }  
71:              else  
72:              {  
73:                Item[] versions;  
74:                using (new FilterPatchCachesDisabler())  
75:                  versions = obj1.Versions.GetVersions(false);  
76:                foreach (Item obj2 in versions)  
77:                {  
78:                  SitecoreIndexableItem versionIndexable = PrepareIndexableVersion(obj2, context);  
79:    
80:                  if (obj2.Version.Equals(obj1.Versions.GetLatestVersion().Version))  
81:                  {  
82:                    Operations.Update(versionIndexable, context, context.Index.Configuration);  
83:                    UpdateClones(context, versionIndexable);  
84:                  }  
85:                  else  
86:                  {  
87:                    Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:deleteitem", (object)this.index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
88:                    Operations.Delete(versionIndexable, context);  
89:                  }  
90:                    
91:                }  
92:              }  
93:            }  
94:            this.Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:updateditem", (object)this.index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
95:          }  
96:          if (!this.DocumentOptions.ProcessDependencies)  
97:            return;  
98:          this.Index.Locator.GetInstance<IEvent>().RaiseEvent("indexing:updatedependents", (object)this.index.Name, (object)indexable.UniqueId, (object)indexable.AbsolutePath);  
99:          this.UpdateDependents(context, indexable);  
100:        }  
101:      }  
102:    
103:      private static SitecoreIndexableItem PrepareIndexableVersion(Item item, IProviderUpdateContext context)  
104:      {  
105:        SitecoreIndexableItem sitecoreIndexableItem = (SitecoreIndexableItem)item;  
106:        ((IIndexableBuiltinFields)sitecoreIndexableItem).IsLatestVersion = item.Versions.IsLatestVersion();  
107:        sitecoreIndexableItem.IndexFieldStorageValueFormatter = context.Index.Configuration.IndexFieldStorageValueFormatter;  
108:        return sitecoreIndexableItem;  
109:      }  
110:    
111:      private void UpdateClones(IProviderUpdateContext context, SitecoreIndexableItem versionIndexable)  
112:      {  
113:        IEnumerable<Item> clones;  
114:        using (new FilterPatchCachesDisabler())  
115:          clones = versionIndexable.Item.GetClones(false);  
116:        foreach (Item obj in clones)  
117:        {  
118:          SitecoreIndexableItem sitecoreIndexableItem = PrepareIndexableVersion(obj, context);  
119:          if (!this.IsExcludedFromIndex(obj, false))  
120:            this.Operations.Update((IIndexable)sitecoreIndexableItem, context, context.Index.Configuration);  
121:        }  
122:      }  
123:    }  
124:  }  
125:    

FilterPatchCachesDisabler.cs


1:  using System;  
2:    
3:  using Sitecore.Common;  
4:  using Sitecore.ContentSearch.Utilities;  
5:  using Sitecore.Data;  
6:    
7:  namespace FilterPatch.Library.ContentSearch  
8:  {  
9:    public class FilterPatchCachesDisabler : IDisposable  
10:    {  
11:      public FilterPatchCachesDisabler()  
12:      {  
13:        Switcher<bool, DatabaseCacheDisabler>.Enter(ContentSearchConfigurationSettings.DisableDatabaseCaches);  
14:      }  
15:    
16:      public void Dispose()  
17:      {  
18:        Switcher<bool, DatabaseCacheDisabler>.Exit();  
19:      }  
20:    }  
21:  }  
22:    

Your index configuration:


1:  <locations hint="list:AddCrawler">  
2:         <crawler type="FilterPatch.Library.ContentSearch.FilterPatchItemCrawler, FilterPatch.Library">  
3:          <Database>master</Database>  
4:          <Root>#Some path#</Root>  
5:         </crawler>  
6:  </locations>  

7 comments:

Pavel Veller said...

Thanks for the reference. Sitecore Support accepted my feature request to take the code that is inside that foreach loop into a public/protected virtual method of its own so that it would be easier to implement a customization like that.

Martin English said...

Ah ok! That's great news. Thanks for letting me know. Keep up the good work. Cheers!

Varun Nehra said...

I get the whole thing about inbound/outbound filters but to simply get around the issue with older versions not being "REMOVED", here is something I did with the inbound filter that worked for me:

if (indexableItem!=null && !indexableItem.Item.Versions.IsLatestVersion())
{
args.IsExcluded = true;
using (var context = ContentIndex.CreateDeleteContext())
{
context.Delete(indexableItem.UniqueId);
}
}

Note: UniqueId includes version number. I guess I did not need to mark it as excluded since I'm pretty much removing it from the index :). It is a whole lot cleaner do a custom crawler but a whole lot of code to keep up with.

Martin English said...

Very true, a mountain of code to maintain. Thanks for sharing your solution Varun.

Wen said...

Great post. However, I don't think it works for web index since there is only one version in web database and somehow when Sitecore updating during the process. It adds more versions rather than updating in web index. Strange! I will try Varun's approach and let you know

Martin English said...

Thanks for the comment Wenshuo. Have you tried the code? It should prevent multiple versions from going into you master index. Else, let me know how it goes with Varun's code.

Cheers!

Wen said...

Martin,

my previous comment are based on the results after trying your code which is kind of making sense to me while debugging it. in line " versions = obj1.Versions.GetVersions(false)" when web index kicks in, there is only one version which is the latest version. I have not tried Varun's code yet.

Cheers,

Post a Comment