-2

Hi I want to calculate the median of certain values based on the segment they fall into which we get by another column. The initial data structure is like given below:

Column A    Column B  
559         1  
559         1  
322         1  
661         2  
661         2  
662         2  
661         2  
753         3  
752         3  
752         3  
752         3  
752         3  
328         4  
328         4  
328         4  

The calculated medians would be based on column A and the output would look like this:

Column A    Column B    Median
559         1           559
559         1           559
322         1           559
661         2           661
661         2           661
662         2           661
661         2           661
753         3           752
752         3           752
752         3           752
752         3           752
752         3           752
328         4           328
328         4           328
328         4           328

Median is calculated based on column A and for the set of values of column B which are same. For example we should calculate medians of all values of column A where column B values are same and paste them in the column Median.

I need to do this operation in r but haven'e been able to crack it. Is there a way to do this through dplyr or any other package?

Thanks

  • 1
    in addition to the answer below, using `dplyr` you can do `df %>% group_by(column2) %>% mutate(median = median(column1)) ` – amatsuo_net Jul 31 '17 at 13:47
  • for some reason this doesn't work , I get the following error `Error in mutate_impl(.data, dots) : incompatible types, expecting a integer vector` – Mouad_Seridi Jul 31 '17 at 13:56

2 Answers2

0

you can use the library(data.table) and then put your data in a data.table

dt <- as.data.table(data) dt[,Median:=median('Column A'),by="Column B"]

quant
  • 4,062
  • 5
  • 29
  • 70
0

here it is, done in base R and data.table way. Apologies in advance - my base r approach might be a bit cumbersome - i do not use it too often.

exampleData=data.frame(A=runif(10,0,10),B=sample(2,10,replace=T))


# Data.frame option
exampleData$Median=tapply(exampleData$A,exampleData$B,median)[as.character(exampleData$B)]

# Data.table option
library(data.table)
exampleData=data.table(exampleData)
exampleData[,Median_Data_Table_Way:=median(A),by=B]