Friday, October 21, 2016

Taming Your Sitecore Analytics Index by Filtering Anonymous Contact Data

With the release of Sitecore versions 8.1 U3 and 8.2, there is a new setting that will dramatically reduce the activity on your instance's analytics index by filtering out anonymous contact data from it.

To put this simply; you don't have to have all the anonymous visitor data added to your analytics index anymore.

 xDB will still capture and show the anonymous visitor data in the various reporting dashboards, but this data won't be added to your analytics index, and you won't see the anonymous contacts in the Experience Profile dashboard.

The new "ContentSearch.Analytics.IndexAnonymousContacts" setting can be found in the Sitecore.ContentSearch.Analytics.config file, and is set to "true" by default.

To quote the setting comments found in this file:

"This setting specifies whether anonymous contacts and their interactions are indexed.
If true, all contacts and all their interactions are indexed. If false, only identified contacts and their interactions are indexed. Default value: true".

One of the key changes to the core code can be seen in in the Sitecore.ContentSearch.Analytics assembly. The magic is on line 14:

1:  using Sitecore.Analytics.Model.Entities;  
2:  using Sitecore.ContentSearch.Analytics.Abstractions;  
3:  using Sitecore.Diagnostics;  
5:  namespace Sitecore.ContentSearch.Analytics.Extensions  
6:  {  
7:   public static class ContactExtensions  
8:   {  
9:    public static bool ShouldBeIndexed(this IContact contact)  
10:    {  
11:     Assert.ArgumentNotNull((object) contact, "contact");  
12:     ISettingsAnalytics instance = ContentSearchManager.Locator.GetInstance<ISettingsAnalytics>();  
13:     Assert.IsNotNull((object) instance, "Settings for contact segmentation index cannot be found.");  
14:     if (instance.IndexAnonymousContacts())  
15:      return true;  
16:     return !string.IsNullOrEmpty(contact.Identifiers.Identifier);  
17:    }  
18:   }  
19:  }  

Why does this matter? 

One of our clients started having some severe Apache Solr issues due to the JVM using a massive amount of memory after running xDB for several months. After our investigation, we discovered that the root cause of the memory usage was due to the analytics index being pounded during the aggregation process. 

The JVM memory usage was like a ticking time bomb. As we started collecting more and more analytics data, our java.exe process started using more and more memory. 

At launch, we gave 4GB to the Java heap size (for more info around this, you can Google Xms<size> -Xmx<size> values). After a few months of running the sites and discovering the memory issue, we felt as though perhaps we assigned our Xmx too low, and upped the memory limit to 8GB. A few weeks later, we outgrow this limit, and we bumped it up to 16GB. 

The high memory usage would eventually cause Solr not respond to query requests and the Sitecore instance to stop functioning. As we know, Sitecore is heavily dependent on an indexing technology (Solr or Lucene), and if it fails, chances are your instance will stop functioning unless you have a magically patch that I mentioned in my previous post:

Analytics Index Comparison 

After upgrading our instance from 8.1 U1 to 8.1 U3, and disabling this setting, we performed an index size comparison. Our analytics index went from 21,728,706 docs and 8GB in size to 0 docs and 101 bytes in size (empty). It's important to note that this is because we currently don’t have any identified contacts within xDB. I find it hard to believe that when we start our contact identification process using CRM system data, that we will be seeing sizes like this in the future.

Final Thoughts 

This setting has made a major different in the stability of our client's high traffic Sitecore sites. It's up to you and your team to decide how important it is to have those anonymous contact records show up in the Experience Profile dashboard. 

To us, it was a no-brainer.

Tuesday, September 6, 2016

Bulletproofing your Sitecore Solr and SolrCloud Configurations


Solr and SolrCloud 

As we know, Sitecore supports both Lucene and Solr search engines. However, there are some compelling reasons to use Solr instead of Lucene that are covered in this article:

Solr has been the search engine choice for all of my 8.x projects over the last few years and I have recently configured SolrCloud for one of my clients where fault tolerance and high availability was an immensely important requirement.

Although I am a big fan of SolrCloud, it is important to note that Sitecore doesn't officially support SolrCloud yet. For more details, see this KB article:

So, should SolrCloud still be considered in your architecture?

My answer to this question is YES!

My reasoning is that members of Sitecore's Technical and Professional Services Team, have implemented a very stable patch to support SolrCloud that has been tested and used in production by extremely large scale SolrCloud implementations. More about this later.

In addition, if you are running xDB, your Analytics index will get very large over time, and the only way to handle this is to break it up unto multiple shards. SolrCloud is needed to handle this.

The Quest to Keep Solr Online 

One of our high traffic clients running xDB started having Solr issues recently and this sparked my research and work with the Sitecore Technical Services team to obtain a patch to keep Sitecore running if Solr was having issues.

As a side note; the issues that we started seeing were related to the Analytics index getting pounded. The most common error that we saw was the following:

 <title>502 Proxy Error</title>  
 <h1>Proxy Error</h1>  
 <p>The proxy server received an invalid  
 response from an upstream server.<br />  
 The proxy server could not handle the request <em><a href="/solr/sitecore_analytics_index/select">GET&nbsp;/solr/sitecore_analytics_index/select</a></em>.<p>  
 Reason: <strong>Error reading from remote server</strong></p></p>  

This only popped up after running xDB for several months, as our analytics index started getting fairly large. Definitively something to keep in mind when you are planning for growth, and as mentioned above, why SolrCloud is the best option for a large-scale, enterprise Sitecore search configuration.

Giving the Java Virtual Machine (JVM) running Apache more memory seemed to help, but this error would continue to rear its nasty head, every so often during periods of high traffic.

Sitecore is very sensitive to Solr connection issues, and will be brought down its knees and throw an exception if it has any trouble!

The Bulletproof Solr Patches 

Single Instance Solr Configuration - Patch #391039 

My research to keep Sitecore online if there are Solr issues led me to this post by Brijesh Patel that was published back in March. After reading though it, I decided to contact Sitecore Support about patch #391039, as it seemed to be just what I wanted for my client's single Solr server configuration.

Working with Andrew Chumachenko from support, our tests revealed that the patch published here didn't handle index "SwitchOnRebuilds". To me, this was a deal breaker.

Andrew discovered that there were several versions of patch #391039 (early versions of the patch were implemented for Sitecore versions 7.2 ), and found at least three different variations.

We found that the most recent version of the patch did in fact support "SwitchOnRebuilds", and Andrew made this available to everyone in the community on GitHub:

This is a quote from Brijesh's post to explain how it works:

" checks if Solr is up on Sitecore start. If no, it skips indexes initializing. However, it may lead to exceptions in log files and inconsistencies while working with Sitecore when Solr is down.

Also, there is an agent defined in the ‘Sitecore.Support.391039.config’ that checks and logs the status of Solr connection every minute (interval value should be changed if needed).

If the Solr connection is restored — indexes will be initialized, the corresponding message will be logged and the search and indexing related functionality will work fine."

SolrCloud Solr Configuration - Patch #449298 

This patch works the same way as patch #391039 described above, but supports SolrCloud.

You may be asking yourself, "isn't the point of having a highly available Solr configuration to ensure that my Solr search doesn’t have issues?"

Well, of course. But, due to the nature in which SolrCloud operates, this patch acts as a fail-safe if something goes wrong - if for example your Zookeepers are trying to determine who the leader is if you lose an instance. If there is a mere second that Sitecore is trying to query Solr, and has trouble, it will throw an exception.

So, patch #449298 accounts for this and also allows index "SwitchOnRebuilds" just like the common, single instance Solr server configurations.

GitHub for this patch: 

It is important to note that this patch requires an IoC container that injects proper implementations for SolrNet interfaces. It depends on patch Sitecore.Support.405677. You can download the assemblies based on your IoC container from this direct link:

Looking Ahead 

Support for Solr out-of-the box (taking into account these patches ) is to be added to the upcoming Sitecore 8.2 U1. So definitely something to look forward to in this release.

A special thanks to Paul Stupka, who is the mastermind behind these patches, and rockstar Andrew Chumachenko for all his help.

Tuesday, August 2, 2016

Diagnosing Content Management Server Memory Issues After a Large Publish



My current project involved importing a fairly large number of items into Sitecore from an external data source. We were looking at roughly around 600k items that weren't volatile at all. We would have a handful of updates per week.

At the start of development, we debated between using a data provider or going with the import, but after doing a POC using the data provider, it was clear that an import was the best option.

The details of what we discovered would make a great post for another time.

NOTE: The version we were running was Sitecore 8.1 Update 2.

The Problem 

After running the import on the our Staging Content Management Server, we were able to successfully populate 594k items in the master database without any issues.

The problem reared its ugly head after we published the large number of items.

After the successful publish, we noticed that there was an instant memory spike on the Content Management Server after the application pool had recycled. Within about 10 seconds, memory usage would reach 90%, and would continue to climb until IIS simply gave up the ghost.

Mind you, our Staging server was pretty decent, an AWS EC2 Windows instance loaded with 15GB of RAM.

So what would cause this?


I confirmed that my issue was in fact caused by the publish by restoring a backup of the web database from before the publish had occurred and recycling the application pool of my Sitecore instance. 

I decided to take a look at what objects were filling up the memory, and so I loaded and launched dotMemory from JetBrains and started my snapshot.

The snapshot revealed some QueuedEvent lists that were eating up the memory:

Next, I decided to fire up SQL Server Profiler to investigate what was happening on the database server.

Running Profiler for about 10 seconds while Sitecore was starting up, showed the following query being executed 186 times within the same process:

SELECT TOP(1) [EventType], [InstanceType], [InstanceData], [InstanceName], [UserName], [Stamp], [Created] FROM [EventQueue] ORDER BY [Stamp] DESC

Why would Sitecore be executing this query so many times, and then filling up the memory on our server?

I know that Content Management instances have a trigger to check the event queue periodically and collect all events to be processed. But, this seemed very strange.

For more info on how this works, you can check out this article by Patel Yogesh:
It's older but still applicable.

I shifted focus onto the EventQueue table to see what it looked like.

EventQueue Table 

A count on the items in my Web database's EventQueue table returned 1.2M.

99% of the items in the EventQueue table were the following remote event records: 

Sitecore.Data.Eventing.Remote.SavedItemRemoteEvent, Sitecore.Kernel, Version=, Culture=neutral, PublicKeyToken=null 

Sitecore.Data.Eventing.Remote.CreatedItemRemoteEvent, Sitecore.Kernel, Version=, Culture=neutral, PublicKeyToken=null 

I ran the following queries to tell me how many "SaveItem" event entries and how many "CreatedItem" event entries existed in the table, that were ultimately put there by my publish: 

SELECT *   FROM [Sitecore_Web].[dbo].[EventQueue]    
WHERE EventType LIKE '%SavedItem%'  AND UserName = 'sitecore\arke'  

SELECT *   FROM [Sitecore_Web].[dbo].[EventQueue]
WHERE EventType LIKE '%CreatedItem%'  AND UserName = 'sitecore\arke'  

Both the former and the later returned 594K items each. This seemed to line up correctly with the number of items that I had recently published, but the fact that we had two entries for each item was the obvious cause of the table having well over 1 million records.

The Solution 

There is a good post on the Sitecore Community site, where Vincent van Middendorp mentions a few Truncate queries to empty the EventQueue table along with the History table:

Truncating the table seemed a bit too evasive at first, so I went ahead and wrote up a quick query to delete the records from the EventQueue table that I knew I had put there (based on my username):

DELETE   FROM [Sitecore_Web].[dbo].[EventQueue] 
WHERE EventType LIKE '%CreatedItem%' OR EventType LIKE '%SavedItem%' 
AND UserName = 'sitecore\arke' 

Running another count on the records in my EventQueue table returned a count of 7.

So, I may well have just run a truncate :)

After firing up my the Sitecore instance again, I was happy to report that memory on the server was now stable.

The Moral of the Story 

Keep an eye on that EventQueue after a large publish!

Looking forward to seeing the publishing improvements coming in Sitecore 8.2.

Monday, June 20, 2016

Sitecore's IP Geolocation Service - Working with the Missing Area Code, Missing Time Zone and Testing GeoIP Lookups



My current projects make heavy use of GeoIP personalization, and as a result, I have had the opportunity to dig deep into Sitecore's IP Geolocation Service features, uncovering the gaps and figuring out ways to get around them.

If you need help setting up the Geolocation Service, make sure you check out my post:

The Missing Area Code

Sitecore gives you a really nice set of rules to work with once you have the service enabled:

Looking above, you will see that the first rule is based on a visitor's area code.

Unfortunately, this rule doesn't work. After decompiling the GeoIP assembly, I was able to determine that the AreaCode property of the WhoIs object is never set in the MapGeoIpResponse processor:

And the rule that is supposed to use the area code:

So, make sure that you use the "postal code" condition if you plan on doing this type of personalization, and not the "area code" one.

The Missing Time Zone

One of my projects has a requirement around the ability to personalize based on the time of day.

When talking about this over lunch with some fellow Sitecore geeks, I got the "..doesn't Sitecore do that out of the box?" response.

At first thought, this seemed like a valid response, as Sitecore advertises that their IP Geolocation Service has the ability to identify a visitor's time zone.

Well, we know that there aren't any time-based rules shown by the Geolocation ruleset above, but if we have the person's time zone, we could at least use this information to conjure up some custom condition that will allow us to do this type of personalization.

Focusing on the GeoIpResponseProcessor again, you will notice that there is no time zone property on the WhoIs object. So, it's clear that we actually don't have a time zone to use.

After digging in a bit further, and confirming a few things with Sitecore support, I was able to determine that the JSON response from Sitecore's Geolocation service does actually contain the time zone information:

So, why wouldn't they make this available via the WhoIs object and API?

This seems a bit odd.

I registered a wish with my support ticket, so hopefully we will see this in a future release. Until then, we have to write a bit of code to peel off the value so that we can use it in our customization.

Time Zone Fun

Getting the Time Zone

In order to get that time zone value from the service, I had to create a custom processor to grab the raw value from service response.


Patch Configuration

Converting Olson to Windows Time

As you can see above, time zones from the service are in IANA time zone format (also known as Olson) as described here:

So the next order of business was to take the Olson time zone ID and convert it to a Windows time zone ID so that when I was ready to perform the local time calculation, it would be fairly easy using the .NET Framework's TimeZoneInfo class.

After a quick Google, I came across this Stack Overflow article that I based my conversion helper method on:

With these pieces on place, I had everything I needed to build out my time-based personalization rules!

Bonus: Testing GeoIP Lookups

To make GeoIP Lookup testing easy, I created a processor that injects a mock ip address obtained from a setting, so you can verify that your dependent rules work as expected.

Note: There is a dated module called Geo IP Tester on the Marketplace, but unfortunately it isn't compatible with Sitecore 8.x due to the changes in the API.


Patch Configuration

Sunday, May 22, 2016

How to ensure that Web API and Sitecore FXM can be implemented together

As a Sitecore MVC developer, implementing a Web API within Sitecore is pretty trivial. There are several informative posts on the web showing you exactly how to get this going.

If you haven't already checked it out, I recommend that you read through Anders Laub's "Implementing a WebApi service using ServicesApiController in Sitecore 8" post.

The Problem

Where this and other posts on the web fall short, is that they don't indicate the correct sweet spot to patch into the initialize pipeline that won't cause havoc if you plan to implement Sitecore's Federated Experience Manager (FXM).

Sitecore's Web API Controller related to the FXM module is decorated with the following attributes:

 [EnableCors("*", "*", "GET,POST", SupportsCredentials = true)]  

What this means is that Sitecore sets the "Access-Control-Allow-Origin" header value to the domain of every external site that is configured by FXM.

Placing your processor before the ServicesWebApiInitializer processor will remove these headers and result in FXM not being able to make cross-domain requests.

So for example, looking at Anders' example, a patch like this:

 <configuration xmlns:patch="">  
     <processor patch:after="processor[@type='Sitecore.Pipelines.Loader.EnsureAnonymousUsers, Sitecore.Kernel']"  
      type="LaubPlusCo.Examples.RegisterHttpRoutes, LaubPlusCo.Examples" />  
will result in this FXM error:

No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin '' is therefore not allowed access.

The Quick Fix

Fortunately, all that you need to do to fix this issue is to patch after Sitecore's ServicesWebApiInitializer.

So looking at the example again, this would fix the issue and result in FXM playing nice:

 <configuration xmlns:patch="">  
     <processor patch:after="processor[@type='Sitecore.Services.Infrastructure.Sitecore.Pipelines.ServicesWebApiInitializer, Sitecore.Services.Infrastructure.Sitecore']"  
      type="LaubPlusCo.Examples.RegisterHttpRoutes, LaubPlusCo.Examples" />  

Sunday, May 15, 2016

3-Step Guide: How to trigger an xDB goal using jQuery AJAX with Sitecore MVC



After cruising around the web looking for some code to use to trigger a goal using jQuery AJAX, I discovered that there weren't really any easy to understand, end-to-end, current examples of how to do this using Sitecore MVC.

So, I decided to write up a quick post to demonstrate how to do this in 3 easy steps.

Step 1 - Create MVC Controller

The first step is to create an MVC Controller with an action that you will use to trigger the goal:

 using System;  
 using System.Linq;  
 using System.Web.Mvc;  
 using Sitecore.Analytics;  
 using Sitecore.Analytics.Data.Items;  
 using Sitecore.Mvc.Controllers;  
 namespace MyNamespace.Controllers  
   public class AnalyticsController : SitecoreController  
     private const string DefaultGoalLocation = "/sitecore/system/Marketing Control Panel/Goals";  
     public ActionResult TriggerGoal(string goal)  
       if (!Tracker.IsActive || Tracker.Current == null)  
       if (Tracker.Current == null)  
         return Json(new { Success = false, Error = "Can't activate tracker" });  
       if (string.IsNullOrEmpty(goal))  
         return Json(new { Success = false, Error = "Goal not set" });  
       var goalRootItem = Sitecore.Context.Database.GetItem(DefaultGoalLocation);  
       var goalItem = goalRootItem.Axes.GetDescendants().FirstOrDefault(x => x.Name.Equals(goal, StringComparison.InvariantCultureIgnoreCase));  
       if (goalItem == null)  
         return Json(new { Success = false, Error = "Goal not found" });  
       var page = Tracker.Current.Session.Interaction.PreviousPage;  
       if (page == null)  
         return Json(new { Success = false, Error = "Page is null" });  
       var registerTheGoal = new PageEventItem(goalItem);  
       var eventData = page.Register(registerTheGoal);  
       eventData.Data = goalItem["Description"];  
       eventData.ItemId = goalItem.ID.Guid;  
       eventData.DataKey = goalItem.Paths.Path;  
       return Json(new { Success = true });  

Step 2 - Register a custom MVC route

The next step is to create a custom processor for the initialize pipeline and define custom route in the Process method similar to the following:

 using System;  
 using System.Collections.Generic;  
 using System.Linq;  
 using System.Web;  
 using System.Web.Mvc;  
 using System.Web.Routing;  
 using System.Web.UI.WebControls;

 using Sitecore.Pipelines;  
 namespace MyNamespace  
  public class RegisterCustomRoute  
   public virtual void Process(PipelineArgs args)  
   public static void Register()  
    RouteTable.Routes.MapRoute("CustomRoute", "MyCustomRoute/{controller}/{action}/{id}");  

Add this processor to the initialize pipeline right before the Sitecore InitializeRoutes processor. You can do this with the help of the patch configuration file in the following way:

 <?xml version="1.0" encoding="utf-8"?>  
 <configuration xmlns:patch="">  
     <processor type="MyNamespace.RegisterCustomRoute, MyAssembly"/>  

Step 3 - Trigger using jQuery

Finally, trigger the goal by name using a few lines of jQuery:

 $.post("/MyCustomRoute/Analytics/TriggerGoal?goal=tweet" ,function(data){  
      //Do something with data object  

Monday, April 18, 2016

Presentation Targets - Chuck Norris's version of Sitecore's Item Rendering



My current project is unique in the way that the Home Page of the site is designed. Basically, the Home Page is a single-page app, where the entire list of products will start appearing as you scroll down the page. They will be grouped by category, and as you keep scrolling down and reach a product list in a new category, the navigation will automatically change to reflect the new category.

It is one of those designs where the client gets all googly eyed over how pretty it looks and you as the Sitecore Architect are left thinking, "how in the world am I going to give my content authors a great experience when building this out in the Experience Editor?"

Well, an obvious approach would be to have one massively long Home Page with a ton of renderings plopped all over the show. But, I was left asking myself, "Self, we need to break this down into separate Product Category pages, so that my Content Authors will have a good experience and the pages will be easier to maintain.

The Problem

Great idea right? But, how would all the content from these pages be dynamically brought over to the Home Page so that we can do this fancy, animated show / hide trick?

I also had to make sure that I would be able to personalize everything based on the approach that I landed on. For example, depending on the visitors' identified persona, or the time of day, I wanted to be able to switch out the order in which the product categories would appear on the Home Page.

True Item Rendering

Doing a bit of research, I came across one of Vasiliy Fomichev's older posts regarding Sitecore's Item Rendering. We share the same underwhelming feeling about Sitecore's implementation of the Item Rendering, and reading further, I was pleasantly surprised to find that his requirements and "true item rendering solution" matched what I was looking for:

The Item Rendering must use the presentation components found in the presentation details of the referenced item

After I downloaded his example project, and fired it up, I realized that this would get me most of the way there; I could set the datasource to any content item that contained presentation components with set datasources, and it would render them in the location where my Item Rendering was placed.

The nice thing too was that I had full Experience Editor support.

So, I could modify the properties of both the Item Rendering and the Renderings within the Rendering (that's a tongue twister).


Presentation Targets is Born

My next thought to myself was "Self, how awesome would it be if I could drop this on a page somewhere and target specific renderings within specific placeholders that exist on another item?"

This would give me a fantastic amount of flexibility and ultimate re-usability of already built presentation components.

So, I decided to update the project and add the following improvements:

  1. A new Presentation Targets Rendering and updated patch file so that both the new rendering and out-of-the-box Item Rendering could live in harmony.
  2. Adjusted the code to allow rendering parameters to be passed through from the target renderings.
  3. Added the ability to set the placeholders and renderings that you would like to target on the datasource item. These are both set as pipe-separated (|) lists of rendering parameters:

So, what does this give me?

To put it simply; a rendering that can target any renderings inside any placeholders on an existing item and render their content.

As I mentioned earlier, you have the ability to modify the content of the targeted renderings within the location that you have placed the Presentation Targets Rendering within the Experience Editor.

So let's study this with an example use case:

  1. You worked very hard to build out a wonderful product carousel, and implemented a series of personalization rules on several of the slides.
  2. There is a promotion this week for a series of products that are part of the wonderful carousel, and so your boss wants you to add the carousel to the Home Page.
  3. Instead of adding a new carousel rendering to the page, setting up each of the slides with datasources and re-applying your personlization rules from scratch, you could place the Presentation Targets Rendering on the Home Page, set the datasource to the item that has the wonderful carousel, set the placeholder that it exists in, and the rendering id of the carousel.
  4. You are done!

The Possibility of Nested Personalization

While working with Lars Peterson and the SBOS team, this topic came a few times during the implementation of a Home Page carousel (yes, I just love carousels if you couldn't tell already).

Looking at the proposed use case: 
  • For a specific group of visitors, change the entire carousel's datasource so that it displays one that is relevant to the specific group.
  • For visitors that are outside of this group, personalize slide 1 with conditions based on geolocation.

So, basically allowing for the possibility to personalize the datasource of an entire carousel, and within that, personalize each individual slide.

Without Presentation Targets

This is achievable by having more than one carousel rendering where we apply a rule with the "Hide Component” set to only show the carousel with the personalized slides for our targeted visitor.

With Presentation Targets

Granted that you would have to initially set up the presentation of the carousels and slides on a seperate item, using the Presentation Targets Rendering will allow you to achieve this level of personalization because with Experience Editor support, you could set personalization rules on both the Presentation Targets Rendering and the Carousel's Slide Renderings.

Now, isn't that fancy?

Final Thoughts

The Presentation Targets module opens up some new opportunities to explore content reuse and the depths of personalization within the Experience Platform.

I will be sure to share more of my experiences as I dig in a bit deeper.

Full source code is available on GitHub:

Another shout-out to Vasiliy Fomichev on his awesome post to get the fire started!