Let this be my data:
my.data<-data.frame(name=c("a","b","b","c","c","c"))
What I need is a variable that indicates for each name, their respective relative frequency in the dataset. Essentially, this would look like that:
name target
1 a 0.1666667
2 b 0.3333333
3 b 0.3333333
4 c 0.5000000
5 c 0.5000000
6 c 0.5000000
What I tried is that I computed dummy variables for each name, and then based on these dummies I calculated new variables that indicate the relative frequency of each name in the dataset. See below:
temp_dummies<-data.frame(spatstat::dummify(my.data$name))
my.data<-cbind.data.frame(my.data, temp_dummies)
rm(temp_dummies)
my.data %>%
dplyr::mutate(a_per=mean(a),
b_per=mean(b),
c_per=mean(c)) -> my.data
Now I need to extract the relative frequencies for each name and aggregate it back to get my target variable. I guess I should do something like this below but I don't know what to mutate.
my.data %>%
dplyr::group_by(name) %>%
dplyr::mutate(...) -> my.data
Questions:
- How would I get my target variable using dplyr? Am I on the right track?
- Is there an easier way to achive the same result?
- Might it be possible to write a function that does all of this stuff automatically? It seems like a pretty standard problem that we should be able to fix by simply applying a
function(x)
toname
.