2

I am trying to remove Personally Identifiable Information (PII) from URLs in out Single Page Application (SPA) registered by Google Tag Manager.

The URLs have the form /customer/1234/invoice/5678, which I want to send to GA4 as /customer/(redacted)/invoice/(redacted)

What I did is the following:

  1. In GTM, I created a Custom JavaScript variable called Page location without ids with the following content. (Note: using {{Page URL}} here, but also tried window.location.href with same effect.)
function() {
  // including timestamp for debugging purposes
  var url = Date.now() + {{Page URL}}.replace(/\d{4}/g, '(redacted)');
  // outputting to console for debugging purposes
  console.log(url);
  return url;
}

Script

  1. In the GA4 configuration tag (which is fired on All Pages), I opened Fields to set and changed the field name page_location to {{Page location without ids}}. Config tag
  2. I started Preview in GTM, and let GTM load the website. Tag Assistant comes up on the page, GTM reports it is connected.
  3. Everything seems well so far:
    • I open the developer console on the website, and see some 20 output lines of the start page URL with timestamp, generated by my GTM script.
    • In GTM's Tag Assistant I can see the modified URL in both the GTM and GA4 containers, under Variables. (In the GTM container assigned to Page location without ids, in the GA4 container assigned to dl (Page Location).
    • In GA4, I can see the modified URL in DebugView, assigned to the page_location Parameter.
  4. However, when I navigate to a page with ids in the URL:
    • The console outputs the redacted URL, good. (4 times actually, don't know why.) Console log
    • However, the payload of the collect call shows the (redacted) starting page URL for the dl parameter. The actual page URL (redacted or not) is not included. Network tab
    • GTM show a History event logged by the GTM container with the redacted URL in the Page location without ids variable, good. The Page Path and Page URL variables however are not redacted, don't know if this is good or bad. GTM container
    • GTM shows for the GA4 container a Page View with the (redacted) starting page URL for the dl (Page Location) parameter! GA container
    • And also GA4 in DebugView shows the starting page URL as page_location parameter. DebugView

So for some reason I am unable to push the redacted URL into the dl parameter for GA4, instead GA4 keeps on using the redacted initial (starting page) URL.

Peter
  • 2,874
  • 2
  • 31
  • 42
  • ok, that's a bit better. Now look at the network tab and see which fields in the ?collect call contain redacted variant of the url and which - not redacted. Ah, also the reason why you see multiple console logs is cuz preview gets values of all variables on every event whether needed or not. for your convenience. and when you override dl, you don't touch page path and page url cuz they are in GTM, not in GA. – BNazaruk Dec 08 '21 at 16:40
  • Looking in the network tab, each collect call contains a `dl` parameter that is the redacted initial (start page) URL (ie, not the correct URL), and a `dr` (referrer) parameter that is the unredacted but actual previous page URL. – Peter Dec 09 '21 at 08:31
  • well, override the dr. GA has no way in the world to know the real page unless it's sent in the network call. Your offender is there. Also, make sure the tid in the call is equal to your property measurement id. – BNazaruk Dec 09 '21 at 16:40
  • Could you provide some info on how to "override the dr"? Can this be done in GTM? And does this help with the static `dl` (location) parameter? The `tid` parameter contains the correct GA4 Measurement ID. (Remember I am receiving data, it's just the location field that is not updated.) – Peter Dec 09 '21 at 17:02
  • Yeah, the reason I asked to look at tid is to make sure you're looking at the right call. Maybe you have several different GA4 properties tracking. It's common. You override the dr exactly how you override the dl. In GTM. dl is document location and dr is document referrer. Oh, also make sure you override these dimensions not only for pageviews, but for events too. – BNazaruk Dec 09 '21 at 18:16
  • Yes I understand any URL must be redacted before being it reaches GA. But my problem is that I am not able to send the right URL to GA on each virtual pageview; all I get is the initial starting page URL (see point 5 in my question). Do you have any idea how I can send the correct URL? – Peter Dec 13 '21 at 08:22
  • you still see the wrong URL when you explicitly set `page_location` in GTM for the GA4 tag? – BNazaruk Dec 13 '21 at 16:48
  • There has been no change in the results I described in the post. – Peter Dec 13 '21 at 16:50
  • Then start adding screenshots where you're setting the dimension, where the same tag fires having a correct variable value in preview and where the corresponding network request has a wrong dl. You made a mistake somewhere. – BNazaruk Dec 13 '21 at 17:04
  • Thanks for looking at this! I have added several screenshots. – Peter Dec 13 '21 at 17:40
  • please add the network tab `?collect` call screenshot with the dl field in it – BNazaruk Dec 13 '21 at 17:46
  • Included the Network tab now, second bullet under point 5. – Peter Dec 14 '21 at 09:26
  • I think you're bluring the redacted parts so it becomes not obvious where it's redacted and where - not. But what is that number at the beginning of the url? Also, it's time when you can stop using real time debugger and see the actual page reports, see if you have redacted there. – BNazaruk Dec 14 '21 at 16:41
  • No, I'm only blurring the hostname and the Analytics ID codes. The number before the URL is a timestamp -- check the tag code that prepends `Date.now()` to the URL. Having a timestamp makes it more obvious that I keep getting the starting page URL, having the lowest timestamp. Note that the starting page is also the one ending with `gtm_debug=xxxxxx`. (Also redacted as I'm redacting all number in the URL.) Not sure what you mean with seeing actualy page reports, I don't want to mess these up (as I'm not able to log any visited page with this code). – Peter Dec 15 '21 at 13:17
  • wow. it gets really murky. Okay. What do you mean by not being able to log any visited page? You obviously log pages. You have a screenshot from the debugger. – BNazaruk Dec 15 '21 at 16:54
  • True, Google Analytics is receiving information on every page visit, but I am not able to send the currect URL (page location). So what I meant is I am unable to log the address of the visited page. The whole problem is that I keep on getting the first page URL. Which is logical in a way since it is an SPA, but obviously not what is wanted. – Peter Dec 16 '21 at 08:05
  • 1
    @Peter have you managed to solve the problem? I have just the same issue and cannot find a way to correct it (other than going for Google Tag Manager Server Side) – Mateusz Kozłowski Mar 29 '22 at 07:55
  • @MateuszKozłowski No, never found a solution... – Peter Apr 06 '22 at 14:05
  • @Peter @MateuszKozłowski see next link for inspiration https://www.thyngster.com/how-to-redact-pii-data-from-google-analytics-4-hits Looks as monkey-patching of `sendBeacon` is the only "normal" way atm to redact all the information which hits GA4 and do it in one place without stupid click-setup-management in GTM. Going to try it very shortly in my SPA, it looks working from the first sight when I tried. – Vladimir Tolstikov May 30 '22 at 15:20

1 Answers1

0

Well, no, no need really. GA4 configuration+pageview tag only needs to be called once. After that, it starts watching history changes and tracks every pageview on most SPAs. You need to only use the real page view trigger and only add more if your SPA doesn't issue history changes on navigations. But vast majority of SPA engines don't make that mistake anymore.

You should actually try implementing it and then ask your question. Update your question when you encounter non-theoretical issues and we'll help.

BNazaruk
  • 6,300
  • 3
  • 19
  • 33
  • You are correct, should have been more detailed on what exactly I tried. Changed the post now. – Peter Dec 08 '21 at 10:07