0

Google Analytics provides MCF data. I want to use this data to interpret the interactions between the medium before a conversion with R.

To simplify, let´s say that I have 2 rows with the medium path and the total number of conversions associated to each path:

  • direct > cpc > organic > cpc > referral > direct | 3 conversions

  • organic > direct > cpc > referral > cpc > direct | 1 conversion

With R, How would you manage your dataframe to have this info:

  • Number of times each channel is before another one. So for instance direct is before cpc (number of occurences of "direct > cpc" in my medium path weighted by the number of conversions)

The problem is that I when I proceed, the number of columns of the final dataframe is huge (1 per channel combination).

  1. How would you manipulate the data in R to have a "clear" and interpretable dataframe ?
  2. Would you use the same methodology or would you use a simple one?
  3. Which R packages would you use (from now I am simplys using stringr to manipulate the paths strings)?
  4. Additionnaly, do you know a mapping package that can perform a more graphic analysis ?

Thanks.

Sylvain

  • Please be specific with your question. There seem to be multiple questions being asked here and one which asks for a tool recommendation. – Quintin Balsdon Dec 09 '16 at 10:48
  • Welcome to SO. You could improve your question. Please read [how to provide minimal reproducible examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). A good post has a specific programming question and does not just ask for general recommendations. It usually provides minimal input data, the desired output data & code tries - all copy-paste-run'able in a new/clean R session. – lukeA Dec 09 '16 at 11:51

1 Answers1

1

Here's an example:

df <- read.table(sep="|", stringsAsFactors = F, text="
direct > cpc > organic | 3 conversions
organic > direct > cpc > email | 1 conversion
cpc > email | 10 conversion")
df[,1] <- trimws(df[,1])
df[,2] <- as.integer(gsub("\\D","",df[,2]))
lst <- strsplit(df[,1], " > ", T)
lst <- lapply(lst, function(x) matrix(embed(x, 2)[, 2:1], ncol=2))
res <- as.data.frame(do.call(rbind, lst))
res$V3 <- rep(df[,2], sapply(lst, nrow))
(res <- aggregate(V3~V1+V2, res, sum))
#        V1      V2 V3
# 1  direct     cpc  4
# 2 organic  direct  1
# 3     cpc   email 11
# 4     cpc organic  3

library(igraph)
g <- graph_from_data_frame(setNames(res, c("source", "target", "weight")))
plot(
  g, 
  edge.width = plotrix::rescale(E(g)$weight, c(1,5)), 
  edge.label = E(g)$weight
)

enter image description here

You might also want to check out the package channelAttribution for attribution modelling.

lukeA
  • 53,097
  • 5
  • 97
  • 100