Calculating median based on segments in r

Question

Hi I want to calculate the median of certain values based on the segment they fall into which we get by another column. The initial data structure is like given below:

Column A    Column B  
559         1  
559         1  
322         1  
661         2  
661         2  
662         2  
661         2  
753         3  
752         3  
752         3  
752         3  
752         3  
328         4  
328         4  
328         4

The calculated medians would be based on column A and the output would look like this:

Column A    Column B    Median
559         1           559
559         1           559
322         1           559
661         2           661
661         2           661
662         2           661
661         2           661
753         3           752
752         3           752
752         3           752
752         3           752
752         3           752
328         4           328
328         4           328
328         4           328

Median is calculated based on column A and for the set of values of column B which are same. For example we should calculate medians of all values of column A where column B values are same and paste them in the column Median.

I need to do this operation in r but haven'e been able to crack it. Is there a way to do this through dplyr or any other package?

Thanks

in addition to the answer below, using `dplyr` you can do `df %>% group_by(column2) %>% mutate(median = median(column1)) ` — amatsuo_net, Jul 31 '17 at 13:47
for some reason this doesn't work , I get the following error `Error in mutate_impl(.data, dots) : incompatible types, expecting a integer vector` — Mouad_Seridi, Jul 31 '17 at 13:56

score 0 · Answer 1 · answered Jul 31 '17 at 13:46

0

you can use the library(data.table) and then put your data in a data.table

dt <- as.data.table(data) dt[,Median:=median('Column A'),by="Column B"]

answered Jul 31 '17 at 13:46

quant

4,062
5
29
70

score 0 · Answer 2 · answered Jul 31 '17 at 13:50

here it is, done in base R and data.table way. Apologies in advance - my base r approach might be a bit cumbersome - i do not use it too often.

exampleData=data.frame(A=runif(10,0,10),B=sample(2,10,replace=T))


# Data.frame option
exampleData$Median=tapply(exampleData$A,exampleData$B,median)[as.character(exampleData$B)]

# Data.table option
library(data.table)
exampleData=data.table(exampleData)
exampleData[,Median_Data_Table_Way:=median(A),by=B]

Calculating median based on segments in r

2 Answers2