I'm looking for help on how to append scores to a new data set of the same data based on already discovered patterns from a training data set. Example of what I am looking to do (take from another one of my posts):
Here is a sample data set which outputs the means of some fake online shopper data.
require(magrittr)
require(dplyr)
set.seed(123)
dat = data.frame(email=sample(c("yahoo", "gmail"), 10000, replace=T),
browser=sample(c("mozilla", "ie"), 10000, replace=T),
country=sample(c("usa", "canada"), 10000, replace=T),
money=runif(10000))
dat.withmean <- dat %>%
group_by(email, browser, country) %>%
summarize(mean = mean(money))
# email browser country mean
# 1 gmail ie canada 0.5172424
# 2 gmail ie usa 0.4921908
# 3 gmail mozilla canada 0.4934892
# 4 gmail mozilla usa 0.4993923
# 5 yahoo ie canada 0.5013214
# 6 yahoo ie usa 0.5098280
# 7 yahoo mozilla canada 0.4985357
# 8 yahoo mozilla usa 0.4919743
Now, let's say we have a new data set that looks like this:
newdat = data.frame(email=sample(c("yahoo", "gmail"), 10000, replace=T),
browser=sample(c("mozilla", "ie"), 10000, replace=T),
country=sample(c("usa", "canada"), 10000, replace=T))
head(newdat, n=10)
# email browser country
#1 gmail ie usa
#2 gmail ie usa
#3 gmail mozilla canada
#4 yahoo mozilla canada
#5 gmail ie canada
#6 yahoo mozilla canada
#7 yahoo mozilla canada
#8 gmail ie usa
#9 yahoo mozilla canada
#10 gmail mozilla canada
#... 10,000 rows...
How can I loop through newdat and check if any combination of columns from newdat matches any rows from dat and then if it does do something like append the value from the "mean" column in dat?