26

I am using Google Analytics 4 (GA4) on the client to track a whole bunch of different events. However, there are 2 scenarios that I can't cover client side:

  1. A user completing check out on a payment page hosted by a third-party (Stripe in this case).
  2. A refund that is made by the support team.

These events are handled by the server using webhooks. To me it seems like the most straightforward solution, would be to let the server send the event to GA4 (as opposed to the client sending it). I believe the Measurement Protocol should be used for this.

For each event submitted through the Measurement Protocol a client_id is required. When the client is submitting an event, this is an automatically generated ID which is used to track a particular device.

My question thus is, what should the client_id be when submitting an event server-side?

Should the same client_id perhaps be used for all events, as to recognize the server as one device? I have read some people proposing to use a randomly generated client_id for each event, but this would result in a new user to be recognized for every server-side event...


EDIT: One of the answers proposes to use the client_id, which is part of the request as a cookie. However, for both examples given above, this cookie is not present as the request is made by a third-party webhook and not by the user.

I could of course store the client_id in the DB, but the refund in the second example is given by the support team. And thus conceptually it feels odd to associate that event with the user's client_id as the client_id is just a way to recognize the user's device? I.e. it is not the user's device which triggered the refund event here.

Another refund event example would be when user A makes a purchase with user B and user B refunds this purchase a week later. In this situation, should the client_id be the one of user A or of user B? Again, it feels odd to use a stored client_id here. Because, what if user A is logged in on two devices? Which client_id should be used here then?

Marnix.hoh
  • 1,556
  • 1
  • 15
  • 26

2 Answers2

13

Great question. Yes, your aim to use Measurement Protocol is a proper solution here.

  1. Do not hardcode the client id. It's gonna be a hellish mess in reports. The nature of user-based reporting (which GA is) demands client ids to uniquely identify users. To your best ability.
  2. GA stores the client id in a cookie. You should have convenient and immediate access to it on every client hit to BE. The cookie name is _ga. GA4 appends the measurement id to the cookie name. Here, google's docs on it: https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage But you can easily find it if you inspect "collect" hits and look at their payloads. There's another cookie named _gid that contains a different value. That would be a unique client id. Set it too if you can, but don't use it for the normal client id. It has a different purpose. Here how the cookie looks here, on stack:

enter image description here

And here it is in Network. You will need it for proper debugging. Mostly to make sure your FE client ids are the same as BE client ids:

enter image description here

  1. Keep an eye on the cases when the cookie is not set. When a cookie is not set, that most frequently means the user is using an ad-blocker. Your analysts will still want to know that the transaction happened even if there's a lack of context about the user. You still can track them properly.

    3.1 The laziest solution would be giving them an "AnonymousUser" client id and then append a random number to that so that it would both indicate that a user is anonymous and still make it possible for GA to separate them.

    3.2 A better solution would be for you to make a fingerprint client id for such users, say, hashing a concatenated string of their useragent+ip+locale+screen resolution, this is up to your analysts to actually work on the definition of a unique user if the google analytics library is unable to do it.

    3.3 Finally, one of the best solutions for you would be generating a client id on your own, keeping GA's format and maybe adding an indicator there that it has been generated on your end just for easier debugging in the Future and setting it as a cookie, using it instead of _ga. Just use a different cookie name so that ad-blockers wouldn't know to block it.

  2. If you want to indicate that a hit was sent through the server, that's a good idea. Use custom dimension for that. Just sync it with your analysts first. Maybe they wouldn't want that, or maybe they would want it in a different dimension.

Now, this is very trivial. There are ways to go much deeper and to improve the quality of data from here. Like gluing the order id, the transaction id, the user id to that, using them to generate client id, do some custom client tracking for the future. But I must say that it's better than what more than 90% of, say, shopify clients have.

Also, GA4 is not good enough for deeper production usage. Many things there are still very rudimentary and lacking. I would suggest concentrating on Universal Analytics and having GA4 as a backup for when Google makes GA4 actually good enough to replace UA. That is, unless you're downloading your data elsewhere and not using GA's interface for analysis.

BNazaruk
  • 6,300
  • 3
  • 19
  • 33
  • Thank you so much for your elaborate answer! You present some very useful concepts, which I will definitely refer to. I have added an edit to my question to elaborate a bit on as to why I don't think the `client_id` provided by the cookie works for some situations. Please let me know if anything in my question and/or edit is unclear. Thank you so much! – Marnix.hoh Aug 14 '21 at 07:56
  • 1
    still, the concept of behavioral tracking insists on client ids. Data without it or with it corrupted will be an unusable mess. It will skew reports a lot too. It's better to not have this data than to bind it to one client id. You can bind it to a real user id, but it's complex in GA. Storing it in a DB is not elegant. And it's gonna be annoying to clean it. It may be best in this case to just generate it on the fly randomly, but it would be better to find a way to get that cookie, maybe include it as an optional field. But yeah if it's a third party sending it, well then it's more difficult. – BNazaruk Aug 14 '21 at 09:23
  • 2
    Thank you for your comment. It surprises me that while GA is also offering ecommerce tracking it does not offer a clean way to handle these server events. Seems like especially the refund case is one that many ecommerce websites have too. Not being able to track these would result in incorrect revenue data... – Marnix.hoh Aug 15 '21 at 13:23
  • @Marnix.hoh if you're unable to save the client ID in your database, when building the Strip Checkout Session, you can pass metadata fields that are saved with the transaction. when webhooks are received, use that metadata value for the Measurement Protocol – Rob Olmos Dec 06 '21 at 22:50
  • 3
    @BNazaruk is there any way to get client_id without using the browser? For example, server side applications – HeelMega Apr 19 '22 at 17:18
  • sure, client id is just a user-persistent randomly generated number. It uniquely identifies a user. front-end cookies are comfortable to use for it. You could implement something like this on the backend too, but how to persist the value without cookies would be up to you. – BNazaruk Apr 19 '22 at 20:47
  • 1
    Thanks for all of this info and sorry for bump. I’m using MP to send events of successful sign_ups and using the recommended event. I have done it like said here that I save the _ga cookie value in my database when the person registers. This is used as client_id when making the MP request. When looking at conversions in my GA it’s says that every sign_up event comes from Direct. Feels like it’s not connecting my server event with other events based on the client_id. Is this normal or am I missing something? – Cous Oct 11 '22 at 19:48
  • Great question deserving a separate post. Quick answer is, to connect them properly, you would also need the session id. The session id is something new that GA4 introduces, you should be able to find it in the same _ga cookie. Inspect your network request to the ?collect endpoint on web. See the sid? You need to set it in MP too. – BNazaruk Oct 11 '22 at 23:02
2

It seems that this page (Relevant portion in the screenshot below), advices to either send the data along with the client_id or user_id. However fails to address the fact client_id is a mandatory field as stated here.

I believe it is probably safe to assume that randomly generating this field should work. At least it seems to on my end however be warned that I am unsure if this has any impact on attribution.

GA4 Help Page

* In the above image, Device ID refers to client_id

K DawG
  • 13,287
  • 9
  • 35
  • 66