Finding values by each specialty

Question

I have a dataset column with different medical specialties but the specialties repeat and for each row there is a count. I want to sum the values for each specialty and also ignore the rows with less than 10. How do I do this in R?

For Ex:

        Col1   Col2 
Internal Med     11
Internal Med     12
   Neurology      5
   Neurology     13
Internal Med      9

I should get Internal Med- 12 + 11 (9 is ignored) Neurology- 13 (5 is ignored)

joel.wilson · Answer 1 · 2017-02-03T20:10:19.567

1

# method 1:
library(data.table)
setDT(df)[Col2 > 10, sum(Col2),by = .(Col1)]

# OR
# method 2
library(dplyr)
df %>% group_by(Col1) %>% 
       filter(Col2 > 10) %>% 
       summarise(sum(Col2))

#           Col1 `sum(Col2)`
# 1 Internal_Med          23
# 2    Neurology          13

edited Feb 03 '17 at 20:10

answered Feb 03 '17 at 19:55

joel.wilson

8,243
5
28
48

What is %>%? I am confused, should I Use the first method or the second method or both? – kobe2792 Feb 03 '17 at 20:02
@RikinMathur its a operator... anyone of the methods – joel.wilson Feb 03 '17 at 20:04
Thanks but it says Error: could not find function "%>%" – kobe2792 Feb 03 '17 at 20:04
`%>%` is part of the `magrittr` package, which the `dplyr` package loads automatically. It has the same function as the pipe `|` in linux/unix if that helps. There are a number of related questions on SO that describe it. – lmo Feb 03 '17 at 20:05
ok it worked but now i am getting an error that group_by can't group by that column. Do i need to provide each specialty name? – kobe2792 Feb 03 '17 at 20:08
@RikinMathur here `df` refers to the name of your dataframe. crosscheck the column names – joel.wilson Feb 03 '17 at 20:09
@RikinMathur also i hope the explanation if done by code itself... do you have any questions on what it's doing? – joel.wilson Feb 03 '17 at 20:15
1

@joel.wilson probably we need to do the `filter` first and then do `group_by` with `dplyr`. – Sandipan Dey Feb 03 '17 at 20:24
1

@SandipanDey seems yeah that can benefit a little – joel.wilson Feb 03 '17 at 20:40

score 0 · Answer 2 · answered Feb 03 '17 at 20:29

purly base based solution :

data <- data.frame(
Col1=c("IM", "IM", "N", "N", "IM"),
Col2=c(11, 12, 5, 13, 9)
)

# sums in groups
aggregate(data$Col2, by=list(data$Col1), FUN=sum)

# sums in groups for obs with Col2 >=10
aggregate(data$Col2[data$Col2>=10], by=list(data$Col1[data$Col2>=10]), FUN=sum)

(but I prefer dplyr)

score 0 · Answer 3 · answered Feb 03 '17 at 20:34

With base R:

aggregate(Col2~Col1, subset(df, Col2 >= 10), sum)
#            Col1 Col2
#1   Internal Med   23
#2      Neurology   13

or

subdf <- subset(df, Col2 >= 10)
as.data.frame(Col2=tapply(subdf$Col2, subdf$Col1, sum))
#               Col2
# Internal Med   23
# Neurology      13

Jealie · Answer 4 · 2017-02-03T22:24:09.053

0

The simplest would be using xtabs:

xtabs( Col2 ~ Col1, df, subset = Col2>10 )

edited Feb 03 '17 at 22:24

answered Feb 03 '17 at 20:52

Jealie

6,157
2
33
36

Finding values by each specialty

4 Answers4