0

I got a dataframe that looks like this.

+---+------+------+------+------+
|   | Name | col1 | col2 | col3 |
+---+------+------+------+------+
| 1 |   A  |  10  |  0   |  0   |
| 2 |   B  |  5   |  20  |  5   |
| 3 |   C  |  15  |  15  |  20  |
| 4 |   D  |  20  |  5   |  15  |
| 5 |   F  |  0   |  10  |  15  |
+---+------+------+------+------+

I want the name of maximum for each column. The expected output should look like this:

+---+------+------+
|   |  col |  MAX |
+---+------+------+
| 1 | col1 |   D  |
| 2 | col2 |   B  |
| 3 | col3 |   C  |
+---+------+------+

how do I code it??

BERKz
  • 45
  • 4
  • 1
    Please, use `dput` to provide us with your data. The chances of getting a solution increases a lot by doing that! – PaulS Jan 23 '22 at 14:49
  • 1
    This may help you to use `dput`: https://stackoverflow.com/questions/49994249/example-of-using-dput – PaulS Jan 23 '22 at 14:59
  • 1
    See my [question](https://stackoverflow.com/questions/52023709/what-can-r-do-about-a-messy-data-format) about your posted data format. – Rui Barradas Jan 23 '22 at 15:21

2 Answers2

3

data.table

library(data.table)
setDT(df)

df2 = melt(df, id.vars="Name", variable.name="col")
df2 = df2[, .SD[which.max(value)], by = col][, c("col", "Name")]
names(df2)[2] = "MAX"

Output:

df2

    col MAX
1: col1   D
2: col2   B
3: col3   C

dplyr

library(dplyr)

df2 = df %>% 
  gather(key="col", value="Value", 2:4) %>% 
  top_n(1, Value) %>%
  rename_at(1, ~"MAX") %>% 
  select(c("col", "MAX"))

Output:

df2

  col MAX
1 col1   D
2 col2   B
3 col3   C

base R

It could perhaps also work a bit more simply or asthethic (Update: see akruns base R solution, which is way better)...

df2 = reshape(df, direction="long", varying=2:4, v.names="value")
df2 = df2[order(-df2$value), ]
df2 = df2[!duplicated(df2$time), c("time", "Name")]
names(df2) = c("col", "MAX")
df2$col = paste0("col", df2$col)
rownames(df2) = NULL

Output:

df2

   col MAX
1 col1   D
2 col2   B
3 col3   C
Marco_CH
  • 3,243
  • 8
  • 25
1

In base R, we can do

stack(sapply(df1[-1], \(x) df1$Name[which.max(x)]))[2:1]
   ind values
1 col1      D
2 col2      B
3 col3      C

data

df1 <- structure(list(Name = c("A", "B", "C", "D", "F"), col1 = c(10L, 
5L, 15L, 20L, 0L), col2 = c(0L, 20L, 15L, 5L, 10L), col3 = c(0L, 
5L, 20L, 15L, 15L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))
akrun
  • 874,273
  • 37
  • 540
  • 662