Thursday, September 20, 2018

Sitecore GeoIP - A Developer's Guide To What Has Changed In Sitecore 9

Standard
In my previous post, I took a dive into the 8.x version of the Sitecore GeoIP service from a developer's point of view. Sitecore 9 introduced great improvements to xDB, GeoIP being one of those features.

In this post, I intend to help developers understand what has changed in Sitecore 9 GeoIP.  Like my previous post, the purpose is to arm developers with the necessary details to understand what is happening under the hood, so that they can successfully troubleshoot a problem if one arises.


Reference Data

One of the first things that I discovered when diving into version 9 is the use of a series of "ReferenceDataClientDictionaries" that are exposed to us as "KnownDataDictionaries".

As inferred by the name, these are known collections of things that are used to store common data, one being IP Geolocation data. The data is ultimately stored in a SQL database, so that it can be referenced throughout the Experience Platform.

There is a new pipeline in Sitecore 9 that initializes these dictionaries, as shown here:

Config: 

<initializeKnownDataDictionaries patch:source="Sitecore.Analytics.Tracking.config">
<processor type="Sitecore.Analytics.DataAccess.Pipelines.InitializeKnownDataDictionaries.InitializeKnownDataDictionariesProcessor, Sitecore.Analytics.DataAccess"/>
<processor type="Sitecore.Analytics.XConnect.DataAccess.Pipelines.InitializeKnownDataDictionaries.InitializeDeviceDataDictionaryProcessor, Sitecore.Analytics.XConnect" patch:source="Sitecore.Analytics.Tracking.Database.config"/>
</initializeKnownDataDictionaries>

Processor Code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
namespace Sitecore.Analytics.DataAccess.Pipelines.InitializeKnownDataDictionaries
{
  public class InitializeKnownDataDictionariesProcessor : InitializeKnownDataDictionariesProcessorBase
  {
    public override void Process(InitializeKnownDataDictionariesArgs args)
    {
      Condition.Requires<InitializeKnownDataDictionariesArgs>(args, nameof (args)).IsNotNull<InitializeKnownDataDictionariesArgs>();
      GetDictionaryDataPipelineArgs args1 = new GetDictionaryDataPipelineArgs();
      GetDictionaryDataPipeline.Run(args1);
      Condition.Ensures<DictionaryBase>(args1.Result).IsNotNull<DictionaryBase>("Check configuration, 'getDictionaryDataStorage' pipeline  must set args.Result property with instance of DictionaryBase type.");
      args.LocationsDictionary = new LocationsDictionary(args1.Result);
      args.ReferringSitesDictionary = new ReferringSitesDictionary(args1.Result);
      args.GeoIpDataDictionary = new GeoIpDataDictionary(args1.Result);
      args.UserAgentsDictionary = new UserAgentsDictionary(args1.Result);
      args.DeviceDictionary = new DeviceDictionary(args1.Result);
    }
  }
}

If you look at line 13 above, the GeoIpDataDictionary object being created is inherited from Sitecore's new ReferenceDataDictionary.

This is the glue between GeoIP and the new Reference Data "shared storage" mechanism.

Here is what the code looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
namespace Sitecore.Analytics.DataAccess.Dictionaries
{
  public class GeoIpDataDictionary : ReferenceDataDictionary<Guid, GeoIpData>
  {
    public GeoIpDataDictionary(DictionaryBase dictionary, int cacheSize)
      : base(dictionary, "GeoIpDataDictionaryCache", XdbSettings.GeoIps.CacheSize * cacheSize)
    {
      this.ReadCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsReads;
      this.WriteCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsWrites;
      this.CacheHitCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsCacheHits;
      this.DataStoreReadCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsDataStoreReads;
      this.DataStoreReadTimeCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsDataStoreReadTime;
      this.DataStoreWriteTimeCounter = AnalyticsDataAccessCount.DataDictionariesGeoIpsDataStoreWriteTime;
    }

    public GeoIpDataDictionary(DictionaryBase dictionary)
      : this(dictionary, XdbSettings.GeoIps.CacheSize)
    {
    }

    public override TimeSpan CacheExpirationTimeout
    {
      get
      {
        return TimeSpan.FromSeconds(600.0);
      }
    }

    public override Guid GetKey(GeoIpData value)
    {
      return value.Id;
    }

    public string GetKey(Guid id)
    {
      return id.ToString();
    }
  }
}


Notice on line 25 that this object is cached for 10 minutes. More on this below.

Reference Data Storage and the GeoIP Lookup Flow

You may be wondering how this Reference Data feature changes what you know about the GeoIP flow from previous versions of the platform.

Let's review the steps:

  • Sitecore runs the CreateVisits pipeline. Within this pipeline, there is a processor called UpdateGeoIpData that fires a method called GeoIpManager.GetGeoIpData within Sitecore.Analytics.Tracking.CurrentVisitContext that initiates the GeoIP lookup for the visitor's interaction.

  • Sitecore performs a GeoIP data lookup in the GeoIP memory cache.
    • NOTE: Cache expiration is set to 10 seconds => TimeSpan.FromSeconds(10.0)

Sitecore.Analytics.Lookups.GeoIpCache:

1
2
3
4
5
6
7
8
    public void Add(GeoIpHandle handle)
    {
      Assert.ArgumentNotNull((object) handle, nameof (handle));
      if (this.cache.Count >= this.maxCount)
        this.Scavenge();
      this.cache.Add(handle.Id, (object) handle, TimeSpan.FromSeconds(10.0));
      AnalyticsTrackingCount.GeoIPCacheSize.Value = (long) this.cache.Count;
    }

  • If the GeoIP data IS in the GeoIP memory cache, then it will attach it to the visitor's interaction.

  • If the GeoIP data IS NOT in the GeoIP memory cache, it performs a lookup in the Reference Data's GeoIpDataDictionary (KnownDictionaries) memory cache.
    • NOTE: Cache expiration is set to 10 minutes => TimeSpan.FromSeconds(600.0). See above for the 10 minute CacheExpirationTimout property on the Sitecore.Analytics.DataAccess.Dictionaries.GeoIpDataDictionary class.

  • If the GeoIP data IS in the Reference Data's GeoIpDataDictionary memory cache, it attaches it to the visitor's interaction and adds it to the GeoIP memory cache.

  • If the GeoIP data IS NOT in the Reference Data's GeoIpDataDictionary memory cache, it performs a lookup in the SQL ReferenceData database and if found, stores the result in the Reference Data's GeoIpDataDictionary cache and GeoIP memory cache, and then attaches it to the visitor's interaction.

  • If the GeoIP data IS NOT in the SQL ReferenceData database, it performs a lookup using the Sitecore Geolocation service and stores the result in the SQL ReferenceData database, the Reference Data's GeoIpDataDictionary cache and GeoIP memory cache, and then attaches it to the visitor's interaction.

Reference Data Storage in SQL

By using SQL Server Management Studio, and opening up the ReferenceData database's DefinitionTypes table, you can see the different types of reference data that is being stored. The GeoIp data type name as you can see below, is called "Tracking Dictionary - GeoIpData".


By looking at the Definitions table, you can see that the data is stored as a Binary data type:


The following SQL Query will return the top 100 GeoIP reference data results:

1
2
3
4
SELECT TOP 100 [xdb_refdata].[DefinitionTypes].Name, [xdb_refdata].[Definitions].Data, [xdb_refdata].[Definitions].IsActive, [xdb_refdata].[Definitions].LastModified, [xdb_refdata].[Definitions].Version
FROM [xdb_refdata].[Definitions]
INNER JOIN [xdb_refdata].[DefinitionTypes] ON [xdb_refdata].[DefinitionTypes].ID = [xdb_refdata].[Definitions].TypeID
WHERE [xdb_refdata].[DefinitionTypes].Name = 'Tracking Dictionary - GeoIpData'



Changes to the GeoIpManager class

Finally, I wanted to provide a glimpse of the changes in the GeoIpManager class that I referenced in my previous post.

By comparing the 8.x version of the GeoIpManager code to 9, you can see the usage of the KnownDataDictionaries.GeoIPs dictionary instead of the Tracker.Dictionaries.GeoIpData (ContactLocation class) from 8.x:



Final Words

I hope that this information helps developers understand more about Reference Data and the updated GeoIP Lookup Flow in Sitecore 9.

As always, feel free to comment or reach me on Slack or Twitter if you have any questions.


0 comments:

Post a Comment