0

The next my question is concerned with 95% CI, namely: I have many categories in dataset (3500) and now, some categories have small number of observations, so when i tried to calculate 95%CI i got the error.

mydata=read.csv(cher,sep=";",dec=",")
View(mydata)
confint <- function(x) t.test(x)$conf.int 
c <- aggregate(. ~ group, data = mydata, confint) 


Error in t.test.default (x): not enough observations 'x'

How to write string in this place, that this confint function would detect missing values of categories and just pass it, then calculate 95% CI for these categories where there are enough obs. Thank for your help me.

dput example

price   group
900000  Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
1750000 Mercedes-Benz-AXOR-2004
900000  Peterbilt-387-2002
    Mercedes-Benz-AXOR-2004
    Peterbilt-387-2002
    Peterbilt-387-2002
    Mercedes-Benz-AXOR-2004
    Mercedes-Benz-AXOR-2004
    Peterbilt-387-2002
1100000 Peterbilt-387-2002
H.Siw
  • 15
  • 5
  • Please provide a [reproducible data set](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) when you're asking a question. Posting the actual `dput` output would help us with answering your question. – Adam Quek Apr 19 '17 at 04:53
  • Hi,Adam. I gave the example of real data. The fact is that the full dataset is very scattered, although there are only 2 columns. It's easier for me to upload total file, this is data on sales truckbrands of different countries [mydata.csv](https://www.sendspace.com/file/v0rvl0) – H.Siw Apr 19 '17 at 10:57

1 Answers1

0

I found the desicion

library("psych")
mydata[[1]] = as.numeric(as.character(mydata[[1]]))
mydata$group_n = ave(mydata[[1]], mydata$group, FUN = function(x) sum(!is.na(x)))
mydata_high3<-mydata[(mydata$group_n)>4,]
confint <- function(x) t.test(x)$conf.int
confint = function(x) mean(x, na.rm = TRUE) + c(-1,1) * sd(x, na.rm = TRUE)/sqrt(sum(!is.na(x)))*1.96
H.Siw
  • 15
  • 5