I have recently transitioned from STATA + Excel to R. So, I would appreciate if someone could help me in writing efficient code. I have tried my best to research the answer before posting on SO.
Here's how my data looks like:
mydata<-data.frame(sassign$buyer,sassign$purch,sassign$total_)
str(mydata)
'data.frame': 50000 obs. of 3 variables:
$ sassign.buyer : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 2 1 ...
$ sassign.purch : num 10 3 2 1 1 1 1 11 11 1 ...
$ sassign.total_: num 357 138 172 272 149 113 15 238 418 123 ...
head(mydata)
sassign.buyer sassign.purch sassign.total_
1 no 10 357
2 no 3 138
3 no 2 172
4 no 1 272
5 no 1 149
6 yes 1 113
My objective is to find average number of buyers with # of purchases > 1.
So, here's what I did:
Method 1: Long method
library(psych)
check<-as.numeric(mydata$sassign.buyer)-1
myd<-cbind(mydata,check)
abcd<-psych::describe(myd[myd$sassign.purch>1,])
abcd$mean[4]
The output I got is:0.1031536697, which is correct.
@Sathish: Here's how check looks like:
head(check)
0 0 0 0 0 1
This did solve my purpose.
Pros of this method: It's easy and typically a beginner level. Cons: Too many-- I need an extra variable (check). Plus, I don't like this method--it's too clunky.
Side Question : I realized that by default, functions don't show higher precision although options (digits=10) is set. For instance, here's what I got from running :
psych::describe(myd[myd$sassign.purch>1,])
vars n mean sd median trimmed mad min max range skew
sassign.buyer* 1 34880 1.10 0.30 1 1.00 0.00 1 2 1 2.61
sassign.purch 2 34880 5.14 3.48 4 4.73 2.97 2 12 10 0.65
sassign.total_ 3 34880 227.40 101.12 228 226.13 112.68 30 479 449 0.09
check 4 34880 0.10 0.30 0 0.00 0.00 0 1 1 2.61
kurtosis se
sassign.buyer* 4.81 0.00
sassign.purch -1.05 0.02
sassign.total_ -0.72 0.54
check 4.81 0.00
It's only when I ran
abcd$mean[4]
I got 0.1031536697
Method 2: Using dplyr I tried pipes and function call, but I finally gave up.
Method 2 | Try1: psych::describe(dplyr::filter(mydata,mydata$sassign.purch>1)[,dplyr::mutate(as.numeric(mydata$sassign.buyer)-1)])
Output:
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "c('double', 'numeric')"
Method 2 | Try2: Using pipes:
mydata %>% mutate(newcol = as.numeric(sassign.buyer)-1) %>% dplyr::filter(sassign.purch>1) %>% summarise(meanpurch = mean(newcol))
This did work, and I got meanpurch= 0.1031537. However, I am still not sure about Try 1.
Any thoughts why this isn't working?