10

I am wondering if there is any way to a median of the rows in a data frame. I understand the function rowmeans exists, but I do not believe there is a row median function. I would like to store the results in a new column in the dataframe. Here is my example

I tried to look online. There was one mention of row medians, but I could not find the function in R.

 C1<-c(3,2,4,4,5)
   C2<-c(3,7,3,4,5)
   C3<-c(5,4,3,6,3)
   DF <- data.frame(ID=c("A","B","C","D","E"),C1=C1,C2=C2,C3=C3)

   DF 


  # This is as far as I have gotten, but not streamlined

  MA <- median(C(3, 3, 5). na.rm = T)   # A
  MB <- median(C(2, 7, 4). na.rm = T)   # B
  MC <- median(C(4, 3, 3). na.rm = T)   # C
  MD <- median(C(4, 4, 6). na.rm = T)   # 4
  ME <- median(C(5, 5, 3). na.rm = T)   # E

  CM <- c(MA, MB, MC, MD, ME)C1<-c(3,2,4,4,5)


   ID C1 C2 C3
  1  A  3  3  5
  2  B  2  7  4
  3  C  4  3  3
  4  D  4  4  6
  5  E  5  5  3

   ID C1 C2 C3  CM
  1  A  3  3  5
  2  B  2  7  4
  3  C  4  3  3
  4  D  4  4  6
  5  E  5  5  3

Is there anyway I can streamline the process so it would be like DF$CM <- median(...

Nick Benelli
  • 101
  • 1
  • 1
  • 4
  • 4
    There's function `matrixStats::rowMedians`. [Related: how to add median value to rows?](https://stackoverflow.com/questions/28077887/how-to-add-median-value-to-rows); [How to calculate row medians efficiently with data.table](https://stackoverflow.com/questions/48885416/how-to-calculate-row-medians-efficiently-with-data-table) – pogibas Jan 25 '19 at 13:50
  • 2
    Possible duplicate of [Find median of every row using matrixStats::rowMedians](https://stackoverflow.com/questions/51525152/find-median-of-every-row-using-matrixstatsrowmedians) – pogibas Jan 25 '19 at 13:52
  • As noted by jogo, your go to function for anything rowwise should be `apply` with the dimension set to 1. – iod Jan 25 '19 at 14:38

4 Answers4

16

To calculate the median of df, you can do the following

df$median = apply(df, 1, median, na.rm=T)
Mathias711
  • 6,568
  • 4
  • 41
  • 58
Jiaqi
  • 486
  • 4
  • 5
6

If you would like to use dplyr, you can find an example here, especially mpalanco's answer. Briefly, after using rowwise to indicate that the operation should be applied by row (rather than to the entire data frame, as by default), you can use mutate to calculate and name a new column off of a selection of existing columns. Check out the documentation on each of those functions for more details.

E.g.,

library(dplyr)

DF %>% 
  rowwise() %>% 
  mutate(CM = median(c(C1, C2, C3), na.rm = TRUE))

will yield the output:

# A tibble: 5 x 5
  ID       C1    C2    C3    CM
  <fct> <dbl> <dbl> <dbl> <dbl>
1 A         3     3     5     3
2 B         2     7     4     4
3 C         4     3     3     3
4 D         4     4     6     4
5 E         5     5     3     5
5

Just a little bit more flexible and up to date. We use c_across with rowwise function and it allows to use tidy-select semantics. Here we choose where to specify we only want the numeric column to calculate the median.

library(dplyr)

DF %>%
  rowwise() %>%
  mutate(med = median(c_across(where(is.numeric)), na.rm = TRUE))

# A tibble: 5 x 5
# Rowwise: 
  ID       C1    C2    C3   med
  <chr> <dbl> <dbl> <dbl> <dbl>
1 A         3     3     5     3
2 B         2     7     4     4
3 C         4     3     3     3
4 D         4     4     6     4
5 E         5     5     3     5
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
0

Oneliner that allows you to pick your desired columns:

apply(DF[, c("C1", "C2", "C3")], 1, median)

  ID C1 C2 C3 CM
1  A  3  3  5  3
2  B  2  7  4  4
3  C  4  3  3  3
4  D  4  4  6  4
5  E  5  5  3  5
JAdel
  • 1,309
  • 1
  • 7
  • 24