Splitting data based on the ranges in R

Question

I would like to know how to divide the subject to 4 different ranges/level. Each level has a certain range. Following is the data.

Std   Name   Subject  Percentage
   2   Vinay   eng      50
   2   Vinay   math     60
   2   Vinay   hindi    70
   2   Rohan   eng      70
   2   vas     mat      50
   2   dheer   eng      35
   2   dheer   math     90
   2   dheer   hindi    80
   2   Bhas    eng      90
   2   Bhas    math     35
   2   Bhas    hindi    50

The four bucket ranges are as follows. <=35, 35-50, 50-75, >75

Expected output:

Std Subject 0-35  35-50  50-75  >75
2    Eng     25%  25%    25%   25%
2    Mat     25%  25%    25%   25%
2    Hin     0%   25%    25%   25%

P.s Values of the ranges are percentage of students scored in that range.

Thanks in advance

Would be good to have a code to reproduce the data used. – m-dz Apr 05 '16 at 15:33 — m-dz, Apr 05 '16 at 15:33
@M.D the data itself is a rawdata. – Ram Apr 05 '16 at 15:44 — Ram, Apr 05 '16 at 15:44

score 0 · Answer 1 · answered Apr 05 '16 at 15:54

This should work, maybe a little more work for the formating:

df<-read.table(header = TRUE, sep=",", text="Std,   Name,   Subject,  Percentage
              2,   Vinay,eng,     50
               2,   Vinay,math,     60
               2,   Vinay,hindi,    70
               2,   Rohan,eng,      70
               2,   vas,math,      50
               2,   dheer,eng,      35
               2,   dheer,math,    90
               2,   dheer,hindi,    80
               2,   Bhas,eng,     90
               2,   Bhas,math,     35
               2,   Bhas,hindi,    50")

breaks<-c(0, 35, 50, 75, 100)
t<-table(df$Subject, responseName=cut(df$Percentage, breaks = breaks) )
format(t/rowSums(t), digits=3)

Thanks for the response, Here since the std is same across al the records it showing group by, What if standards are different where to include by clause in the code? — Ram, Apr 13 '16 at 07:03

score 0 · Accepted Answer · answered Apr 05 '16 at 16:03

0

A possible data.table solution:

library(data.table)

dat <- data.table(Std = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
                  Name = c("Vinay", "Vinay", "Vinay", "Rohan", "vas", "dheer", "dheer", "dheer", "Bhas", "Bhas", "Bhas"),
                  Subject = c("eng", "math", "hindi", "eng", "mat", "eng", "math", "hindi", "eng", "math", "hindi"),
                  Percentage = c(50L, 60L, 70L, 70L, 50L, 35L, 90L, 80L, 90L, 35L, 50L))

dat[, PCTs := cut(Percentage,
                  breaks = c(0, 35, 50, 75, 100),
                  include.lowest = TRUE)]

res <- dat[, list(
               "0-35" = sum(PCTs == "[0,35]") / .N * 100,
               "35-50" = sum(PCTs == "(35,50]") / .N * 100,
               "50-75" = sum(PCTs == "(50,75]") / .N * 100,
               ">75" = sum(PCTs == "(75,100]") / .N * 100
             ),
             by = c("Std", "Subject")]

print(res, digits = 2)

answered Apr 05 '16 at 16:03

m-dz

2,342
17
29

Thanks for the solution, But if I have 500 records as this, Its tough to hardcode in the data.table,Importing a file would be the option, tried using dat<-read.csv(file.choose()), But it didn't work, Could you please tell me how to do ? – Ram Apr 05 '16 at 16:31
The code above assumes a `data.table` is used, so either do `as.data.table(yourObject)` or use `fread()` function from `data.table` package. – m-dz Apr 05 '16 at 16:44
Great! `data.table` syntax can be a bit odd at first glance, but do not get scared of! – m-dz Apr 07 '16 at 08:34
from the above code it also including 35,50,75,100 in the range , I tried with [0,35), [35,50), [50,75), [75,100] but its showing as 0 – Ram Apr 15 '16 at 13:00
With `"[0,35]"`, `"(35,50]"` etc. you are only selecting specific `PCTs` values, to change which endpoints you want to include and which exclude check options of `cut()` functions, i.e. use `?cut` in R console. – m-dz Apr 15 '16 at 13:57
Can you tell me how to change in cut, tried couple of efforts, but in vain – Ram Apr 15 '16 at 13:59
One of the arguments of `cut` is called `right`, and it is a `logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.` – m-dz Apr 15 '16 at 14:01
Got that. Thanks @M.D – Ram Apr 15 '16 at 14:04
Sure, no problem at all. – m-dz Apr 15 '16 at 14:04

Splitting data based on the ranges in R

2 Answers2