0

I have a table of locations and values for Precipitation of each month.

I need to add a new column with name of the month that has the maximum Precipitation for each location.

I tried to do that:

cbind(rainfall, max_month = apply(rainfall[,3:11],1,which.max))

but I'm getting only the number of the column and I need the name of the column. I got this :

[1] 5 5 5 5 5 5 5 5 4 4 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
 [59] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5
[117] 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4

I tried to add the names function and the colnames function' but both of them didnt help.

names(apply(rainfall[,3:11],1,(which.max))) 

Thanks

enter image description here

Sotos
  • 51,121
  • 6
  • 32
  • 66
Michael Spector
  • 113
  • 1
  • 7
  • 2
    You need `apply(df, 1, function(i) names(i[which.max(i)]))`. However, check the function `max.col` – Sotos May 03 '17 at 13:20
  • 2
    Could you make your answer reproducible? This will make it a lot easier to help you, especially with providing you with alternative solutions. – Paul Hiemstra May 03 '17 at 13:22

2 Answers2

4

Best way to this is via max.col. You should always avoid apply on data.frames,

names(rainfall)[max.col(rainfall[3:11])]
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

You probably need something along the lines of:

names(rainfall[,3:11])[apply(rainfall[,3:11],1,which.max)]

Here you transform the column id to a name by subsetting the names(rainfall) vector. Note that repeating an index, e.g. c(5, 5, 5, 5) repeats the extracted value.


An alternative approach using dplyr:

library(dplyr)
library(mtcars)
mtcars %>% 
    gather(month, precip_value, disp, hp, drat, wt) %>% 
    group_by(gear) %>% 
    summarise(max_month = month[which.max(precip_value)])

Note that this approach uses the mtcars dataset as your example was not reproducible. Here, gear would be your station id. The trick is to restructure the data a bit from wide to long format using gather, then splitting the data per station using group_by and then determining the max month using summarise. Just food for thought, the answer of @sotos is quite elegant.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149