how to select data based on a list from a split data frame and then recombine in R

Question

I am trying to do the following. I have a dataset Test:

 Item_ID     Test_No        Category    Sharpness       Weight   Viscocity 
 132           1              3        14.93199362  94.37250417 579.4236727
 676           1              4        44.58750591  70.03232054 1829.170727
 699           2              5        89.02760079  54.30587287 1169.226863
 850           3              6        30.74535903  83.84377678 707.2280513
 951           4              237      67.79568019  51.10388484 917.6609965
1031           5              56       74.06697003  63.31274502 1981.17804
1175           4              354      98.9656142   97.7523884  100.7357981
1483           5              726      9.958040999  51.29537311 1222.910211
1529           7              800      64.11430235  65.69780939 573.8266137
1698           9              125      67.83105185  96.53847341 486.9620194
1748           9              1005     49.43602318  52.9139591  1881.740184
2005           9              28       26.89821508  82.12663209 1709.556135
2111           2              76       83.03593144  85.23622731 276.5088502

I would want to split this data based on Test_No and then compute the number of unique Category per Test_No and also the Median Category value. I chose to use split and Sappply in the following way. But, I am getting an error regarding a missing parenthesis. Is there anything wrong in my approach ? Please find my code below:

function(CatRange){
  c(Cat_Count = length(unique(CatRange$Category)), Median_Cat = median(unique(CatRange$Category), na.rm = TRUE) )
}

CatStat <- do.call(rbind,sapply(split(Test, Test$Test_No), function(ModRange)))

Appending my question: I would want to display the data containing the following information: Test_No, Category, Median_Cat and Cat_Count

Ronak Shah · Accepted Answer · 2017-02-02T08:25:16.623

1

We can try with dplyr

library(dplyr)
Test %>%
  group_by(Test_No) %>%
  summarise(Cat_Count = n_distinct(Category), 
            Median_Cat = median(Category,na.rm = TRUE), 
            Category = toString(Category))

#    Test_No Cat_Count   Median_Cat  Category
#    <int>  <int>         <dbl>         <chr>
#1       1      2          3.5           3, 4
#2       2      2          40.5         5, 76
#3       3      1          6.0              6
#4       4      2         295.5      237, 354
#5       5      2         391.0       56, 726
#6       7      1         800.0           800
#7       9      3         125.0 125, 1005, 28

Or if you prefer base R we can also try with aggregate

aggregate(Category~Test_No, CatRange, function(x) c(Cat_Count = length(unique(x)), 
                   Median_Cat = median(x,na.rm = TRUE), Category = toString(x)))

As far as the function written is concerned I think there are some synatx issues in it.

new_func <- function(CatRange){
 c(Cat_Count = length(unique(CatRange$Category)), 
   Median_Cat = median(unique(CatRange$Category), na.rm = TRUE), 
   Category = toString(CatRange$Category))
}

data.frame(t(sapply(split(CatRange, CatRange$Test_No), new_func)))

#  Cat_Count Median_Cat      Category
#1         2        3.5          3, 4
#2         2       40.5         5, 76
#3         1          6             6
#4         2      295.5      237, 354
#5         2        391       56, 726
#7         1        800           800
#9         3        125 125, 1005, 28

edited Feb 02 '17 at 08:25

answered Feb 02 '17 at 07:16

Ronak Shah

377,200
20
156
213

My dataframe is 'Test' and I do not see that mentioned. Is there something that I am missing ? – jaycee4u Feb 02 '17 at 07:26
1

Thanks Ronak ! That worked ! I used the dplyr method ! – jaycee4u Feb 02 '17 at 07:32
However, it would have been great if I could display the category information also. Is there any way to to that ? Thats the reason why I was trying to use sapply. – jaycee4u Feb 02 '17 at 07:36
@jaycee4u What `category` information do you need? Add your expected output to the question. – Ronak Shah Feb 02 '17 at 07:37
Just appended my question. I wanted `Test` to which the `Category` belonged. The categories are not unique. The same `Category` might be present across multiple `Test_ID`.Sorry for the confusion. – jaycee4u Feb 02 '17 at 07:41
@jaycee4u updated the answer. Is this what you needed? – Ronak Shah Feb 02 '17 at 07:42
new_func did not work. It threw up a lot of warnings and the output was a list containing zeros. – jaycee4u Feb 02 '17 at 07:50

how to select data based on a list from a split data frame and then recombine in R

1 Answers1