0

My dataset looks somewhat similar to this:

Category   Value1   Value2
       A       19      143
       A       12      124
       A       21      130  
       B       23      323
       B       24      323
       B       23      342
       B       24      233
       B       27      234
       C       28      212
       C       29      233      
       D       11      365
       D       12      323
       D       13      344

This dataset has many categories viz. A,B,C,D etc and two columns

How can we find out 90th percentile of these values in category-wise?

Output should be in this pattern:

enter image description here

Abdul Shiyas
  • 401
  • 3
  • 9
  • 30
  • 2
    Please post a reproducible example, not screenshots. But I doubt the usefulness of 90th percentile calculated on 2 values. –  Jun 01 '15 at 05:40
  • 2
    Here is a link to refer for making reproducible examples http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example In general as @Pascal mentioned, by using images, it is not easy to test it. I have to manually type the values (hands still aching :-) – akrun Jun 01 '15 at 05:48
  • 1
    @Pascal Sorry. Will insert reproducible examples from next time – Abdul Shiyas Jun 01 '15 at 05:51

1 Answers1

5

Try

library(dplyr)
df1 %>%
   group_by(Category) %>% 
   summarise_each(funs(quantile(., 0.90)))
#    Category Value1 Value2
#1        A   20.6  140.4
#2        B   25.8  334.4
#3        C   28.9  230.9
#4        D   12.8  360.8

Or

library(data.table)
setDT(df1)[, lapply(.SD, FUN=quantile, prob=0.90), Category]

Or using aggregate from base R

aggregate(.~Category, df1, FUN=quantile, prob=0.90)

data

df1 <- structure(list(Category = c("A", "A", "A", "B", "B", "B", "B", 
"B", "C", "C", "D", "D", "D"), Value1 = c(19, 12, 21, 23, 24, 
23, 24, 27, 28, 29, 11, 12, 13), Value2 = c(143, 124, 130, 323, 
323, 342, 233, 234, 212, 233, 365, 323, 344)), .Names = c("Category", 
"Value1", "Value2"), row.names = c(NA, -13L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • The problem is I am importing a csv file. So the columns in the imported file is in "factor". How can I convert them to charater and numerics? – Abdul Shiyas Jun 01 '15 at 07:03
  • @AbdulShiyas In `read.csv/read.table`, there is an option to prevent the 'non-numeric' columns from converting to 'factors', i.e. `df1 <- read.csv('yourfile.csv', header=TRUE, stringsAsFactors=FALSE)`. I am not sure why the Value columns are not read as 'numeric' class. Can you check whether there are any non-numeric element in those columns? – akrun Jun 01 '15 at 07:06
  • regfile=read.csv("C:/Users/registration.csv") l=as.data.frame(regfile) class(l$VillageName) this gives output as "factor". Only if it is character we will get desired output. – Abdul Shiyas Jun 01 '15 at 07:09
  • @AbdulShiyas `regfile <- read.csv("...)` is itself a data.frame. You don't have to convert to data.frame again. As I said in the comments, use `stringsAsFactors=FALSE` to prevent converting to 'factor' – akrun Jun 01 '15 at 07:12
  • Ya. I got it. In excel, the number format included commas ex: 2,543.50 instead of 2543.50 – Abdul Shiyas Jun 01 '15 at 07:13
  • @AbdulShiyas In that case, we need to replace the `,` with `''`. After reading the dataset i..e `regfile`, `regfile[2:3] <- lapply(regfile[2:3], function(x) as.numeric(sub(',', '', x)))` assuming that 2nd and 3rd column are the value column as you showed – akrun Jun 01 '15 at 07:15