Thursday, July 23, 2015

FXM Experience Editor - Cleaning Up Problematic External Content

Standard

Background

I have been using Federated Experience Manager (FXM) a lot, and it has worked extremely well in almost all of our client deployments.




I recently ran into an issue with one of our client's production instance where we had an external site that wasn't able to load up in the FXM Experience Editor.  We could see the page start loading, and then after a few seconds, it would stop leaving a blank page with a little bit of script. This left us scratching our heads for quite some time.

Using Fiddler's AutoResponder feature, we were eventually able to determine
that if we disabled an Adobe Dynamic Tag Management script library, we were able to successfully load the external site in the Experience Editor.

Our problematic script:






Our client made it very clear that removing the script from the external site was not an option. So, we needed to find a way to remove the script from the external site when loading it in the Experience Editor only.

Finding the Sweet Spot

Almost certain that there had to be a pipeline that I could hook into, I armed myself with the instance's "showconfig" and my favorite decompiler (JetBrains dotPeak), I started digging around until I discovered the following pipeline:

1:  <content.experienceeditor>  
2:   <processor type="Sitecore.FXM.Client.Pipelines.ExperienceEditor.ExternalPage.GetExternalPageContentProcessor, Sitecore.FXM.Client"/>  
3:   <processor type="Sitecore.FXM.Client.Pipelines.ExperienceEditor.ExternalPage.UpdateBeaconScriptPathProcessor, Sitecore.FXM.Client"/>  
4:   <processor type="Sitecore.FXM.Client.Pipelines.ExperienceEditor.ExternalPage.InjectControlsProcessor, Sitecore.FXM.Client"/>  
5:   <processor type="Sitecore.FXM.Client.Pipelines.ExperienceEditor.ExternalPage.AddPlaceholderData, Sitecore.FXM.Client"/>  
6:  </content.experienceeditor>  
This looked very promising indeed! Next step was to crack open the GetExternalPageContentProcessor.

Voila, I found exactly what I was looking for; the point at which FXM grabs the content from the external site, and sticks it into an argument that the rest of the processors can access (lines 19 & 20):

1:  public void Process(ExternalPageExperienceEditorArgs args)  
2:    {  
3:     Assert.ArgumentNotNull((object) args, "args");  
4:     Assert.ArgumentNotNull((object) args.MatcherContextItem, "MatcherContextItem");  
5:     Assert.ArgumentNotNull((object) args.ExperienceEditorUrl, "ExperienceEditorUrl");  
6:     string externalPageUrl = this.GetExternalPageUrl(args);  
7:     if (string.IsNullOrEmpty(externalPageUrl))  
8:     {  
9:      args.AbortPipeline();  
10:     }  
11:     else  
12:     {  
13:      string experienceEditorUrl = this.GetBaseExperienceEditorUrl(args);  
14:      HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Get, externalPageUrl);  
15:      request.Headers.Add("FxmReferrer", (IEnumerable<string>) new string[1]  
16:      {  
17:       experienceEditorUrl  
18:      });  
19:      HttpResponseMessage httpResponseMessage = this.externalSiteWebProxy.MakeRequest(string.Format("{0}&url={{0}}", (object) experienceEditorUrl), request);  
20:      args.ExternalPageContent = httpResponseMessage.Content.ReadAsStringAsync().Result;  
21:     }  
22:    }  

FXM External Content Sanitizer Processor

Yes, the name is a mouthful!

Knowing that this type of problem was bound to pop up again, I decided to write a custom processor that would accept a series of regular expressions in configuration,  and use them to strip out any problematic content that may be causing issues when loading up the FXM Experience Editor.

Processor

1:  public class SanitizeContent : IExternalPageExperienceEditorProcessor  
2:    {  
3:      private static string _sanitizeNode = "/sitecore/fxmSanitizeExternalContent";  
4:      public void Process(ExternalPageExperienceEditorArgs args)  
5:      {  
6:        Assert.ArgumentNotNull(args, "args");  
7:        Assert.ArgumentNotNull(args.ExternalPageContent, "ExternalPageContent");  
8:        foreach (var regex in GetRegexList())  
9:        {  
10:          var currentReg = new Regex(regex);  
11:          var cleanHtml = currentReg.Replace(args.ExternalPageContent, "");  
12:          args.ExternalPageContent = cleanHtml;  
13:        }  
14:      }  
15:      /// <summary>  
16:      /// Returns list of strings containing regular expressions that have been set in configuration  
17:      /// </summary>  
18:      private static IEnumerable<string> GetRegexList()  
19:      {    
20:        var configNode = Factory.GetConfigNode(_sanitizeNode);  
21:        var regexList = new List<string>();  
22:        foreach (XmlNode childNode in configNode.ChildNodes)  
23:        {  
24:          regexList.Add(XmlUtil.GetAttribute("value", childNode));  
25:        }  
26:        return regexList;  
27:      }  
28:    }  

Configuration

You can duplicate line 4 and add as many regular expressions as you need to. In my configuration, I added a regular expression (thanks Andy Uzick for the help) to strip out any script that contained the words "adobetm".

1:  <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">  
2:   <sitecore>  
3:    <fxmSanitizeExternalContent>  
4:     <sanitizeRegex value="&lt;script[^&lt;]*(adobedtm)[\s\S]*?&lt;/script&gt;"/>  
5:    </fxmSanitizeExternalContent>  
6:    <pipelines>  
7:     <group groupName="FXM" name="FXM">  
8:      <pipelines>  
9:       <content.experienceeditor>  
10:        <processor type="MyProject.Domain.Pipelines.ExperienceEditor.ExternalPage.SanitizeContent, MyProject.Domain" patch:after="processor[@type='Sitecore.FXM.Client.Pipelines.ExperienceEditor.ExternalPage.GetExternalPageContentProcessor, Sitecore.FXM.Client']" />  
11:       </content.experienceeditor>  
12:     </pipelines>  
13:    </group>  
14:   </pipelines>  
15:  </sitecore>  
16: </configuration>  

Problem solved!

With the script removed, the external site loaded up in the FXM Experience Editor, and we were able to complete the tasks that we had originally set out to do.

I hope that this helps others that run into this same issue.




2 comments:

  1. Great post. I suggest you do not use Regular Expressions for HTML. It's fine in this instance, but there is nothing regular about HTML and the many weird and wonderful syntaxes that it allows and a browser forgives but an regex would not... in case you've never come across this Stackoverflow answer which is both hilarious and informative :)

    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

    Instead, I suggest you use HTMLAgilityPack [http://htmlagilitypack.codeplex.com/] or is you want a more jQuery-esque syntax (my preference) use FizzlerEx [http://fizzlerex.codeplex.com/] or CsQuery [https://github.com/jamietre/CsQuery]. It also has the advantage you don;t pull your hair out trying to figure out regular expression syntax!

    ReplyDelete
  2. Appreciate the feedback Kamruz! Andy Uzick, suggested the same thing as a powerful enhancement. I will update the post in the future with this implementation. Cheers!

    ReplyDelete