0

I have a huge dataframe - more than 1000 columns, 20000 rows.

   id      a        b        c        d
1  42      3       NA       NA       NA
2  42     NA        6       NA       NA
3  42     NA       NA        7       NA

The goal is to find highest value for each column. The only way how to know do that is via mutate, but since there are so much column, it is impossible to write for each column separately.

Result should looks like this. Please note that for some columns is highest value NA.

   id      a        b        c        d
1  42      3        6        7       NA

2 Answers2

1

Assuming your date is called df, you can use base R to accomplish it:

df2 <- sapply(df1, function(x) max(x, na.rm = T))
df2
#id    a    b    c    d 
#42    3    6    7 -Inf 

is.na(df2) <- sapply(df2, is.infinite)
#id  a  b  c  d 
#42  3  6  7 NA 
patL
  • 2,259
  • 1
  • 17
  • 38
0
df <- read.table( text = "   id      a        b        c        d
1  42      3       NA       NA       NA
                  2  42     NA        6       NA       NA
                  3  42     NA       NA        7       NA", header = TRUE)

library(dplyr)
summarise_all(df, funs( max( ., na.rm = TRUE ) ) )

  id a b c    d
1 42 3 6 7 -Inf

if you want, change -inf to NA

Wimpel
  • 26,031
  • 1
  • 20
  • 37