I am using R to get some data from Google Analytics API. In this particular scenario, I get the information about the Affinity Interest of my users segmented by gender and age group. The structure of the data I get looks something similiar to:
gender ageGroup interest sessions
male 18-24 Autos 4
male 18-24 Autos/Luxury 1
male 18-24 Autos/Vans 1
male 25-34 Autos 8
male 25-34 Autos/Luxury 2
male 25-34 Autos/Vans 2
male 25-34 Autos/Compacts 1
...
female 65+ Fashion 20
The problem with this structure is, however, Autos as the main interest contains also the sessions of the sub-categories, and if I use this data in a Pivot Table I will get the wrong information.
Therefore, I am adding the subcategory "Generalists" to every main category as its own subcategory and splitting this column in two:
for (i2 in 1:nrow(ga.genderAgeAffinityTable) ) {
# main categories <- chrFound = integer(0)
chrFound <- grep("[/]", ga.genderAgeAffinityTable$interest[i2] )
if (length(chrFound) < 1) {
ga.genderAgeAffinityTable$interest[i2] <-
sprintf("%s/Generalists", ga.genderAgeAffinityTable$interest[i2])
}
ga.genderAgeAffinityTable <- as.data.frame
(cSplit(ga.genderAgeAffinityTable, "interest", sep = "/"))
}
View(ga.genderAgeAffinityTable)
gender ageGroup interest subcategory sessions
male 18-24 Autos Generalists 4
male 18-24 Autos Luxury 1
male 18-24 Autos Vans 1
male 25-34 Autos Generalists 8
male 25-34 Autos Luxury 2
male 25-34 Autos Vans 2
male 25-34 Autos Compacts 1
...
female 65+ Fashion Generalists 20
I still have to get rid of the wrong sessions calculation, as for the first group (males, 18-24 years, Auto lovers), Generalists should have only 2 sessions (sessions - sum(other subcategories)). I am doing that using an auxId (genderAgeInterestSubcategory), summing all sessions by that auxId, merging the aggregated sessions as a new column in my dataframe and recalculating sessions for subcategory "Generalists":
ga.genderAgeAffinityTable$auxId <- sprintf("%s%s%s",
ga.genderAgeAffinityTable$gender, ga.genderAgeAffinityTable$age,
ga.genderAgeAffinityTable$interest_1 )
ga.interestAggregated <- aggregate(ga.genderAgeAffinityTable[,c("sessions")],
by=list(ga.genderAgeAffinityTable$auxId), "sum")
colnames(ga.interestAggregated) <- c("auxId", "aggregated")
ga.genderAgeAffinityTable <- (merge(ga.genderAgeAffinityTable,
ga.interestAggregated, by = 'auxId'))
for (i3 in 1:nrow(ga.genderAgeAffinityTable) ) {
if (ga.genderAgeAffinityTable$interest_2[i3] == "Generalists" ) {
# Do not recalculate sessions for interests with only Generalists as subcategory
if (ga.genderAgeAffinityTable$aggregated[i3] -
ga.genderAgeAffinityTable$sessions[i3] != 0 ) {
ga.genderAgeAffinityTable$sessions[i3] <-
ga.genderAgeAffinityTable$aggregated[i3] -
ga.genderAgeAffinityTable$sessions[i3]
}
}
}
Do you know a more straightforward way to do the same without using an auxid?