How to find the highest value of a column in a data frame in R?

Question

I have the following data frame which I called ozone:

   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9

I would like to extract the highest value from ozone, Solar.R, Wind...

Also, if possible how would I sort Solar.R or any column of this data frame in descending order

I tried

max(ozone, na.rm=T)

which gives me the highest value in the dataset.

I have also tried

max(subset(ozone,Ozone))

but got "subset" must be logical."

I can set an object to hold the subset of each column, by the following commands

ozone <- subset(ozone, Ozone >0)
max(ozone,na.rm=T)

but it gives the same value of 334, which is the max value of the data frame, not the column.

Any help would be great, thanks.

`max(ozone$Ozone)` or `max(subset(ozone,select=Ozone))`. You should definitely look at some introductory R material on column indexing for data frames, which is your basic trouble. (This is a coursera question, right? https://github.com/ahawker/data-analysis-coursera/blob/master/HW1/hw1.R ) — Ben Bolker, Jun 13 '14 at 19:40
@BenBolker Yes it is. btw how did you gray sections of your comment. — Al V, Jun 13 '14 at 20:48
I used backticks `` (I'm sure there's formatting help around here somewhere?) — Ben Bolker, Jun 13 '14 at 20:48
Thanks! I am following you on github, btw the name of the course is now [R Programming](https://www.coursera.org/course/rprog). It's part of the ["Data Specialization Track"](https://www.coursera.org/specialization/jhudatascience/1?utm_medium=courseDescripTop) — Al V, Jun 13 '14 at 20:59
@BenBolker: Here's a [link to SO comment formatting](http://stackoverflow.com/editing-help#comment-formatting) - always available by clicking the "help" link beside the comment box. — jbaums, Jun 14 '14 at 02:46

Rich Scriven · Accepted Answer · 2015-08-15T18:25:02.750

Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

Get your data:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

Use colMax function on sample data:

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

To do the sorting on a single column,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

and over all columns use our colSort function,

colSort(dat, decreasing = TRUE) ## compare with '...' above

@Frank - right you are. I'm not really doing much around here any more. Feel free to edit and I will make it a community wiki — Rich Scriven, Jun 24 '20 at 00:10

score 49 · Answer 2 · edited Jun 14 '14 at 02:50

49

To get the max of any column you want something like:

max(ozone$Ozone, na.rm = TRUE)

To get the max of all columns, you want:

apply(ozone, 2, function(x) max(x, na.rm = TRUE))

And to sort:

ozone[order(ozone$Solar.R),]

Or to sort the other direction:

ozone[rev(order(ozone$Solar.R)),]

edited Jun 14 '14 at 02:50

jbaums

27,115
5
79
119

answered Jun 13 '14 at 19:46

WheresTheAnyKey

848
5
6

4

To get the max of all columns , it can also be: `apply(ozone, 2, max, na.rm = TRUE)`. – user 31466 Jun 20 '16 at 12:26

score 13 · Answer 3 · edited Dec 03 '17 at 02:04

13

Here's a dplyr solution:

library(dplyr)

# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))

# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))

UPDATE: summarise_each() has been deprecated in favour of a more featureful family of functions: mutate_all(), mutate_at(), mutate_if(), summarise_all(), summarise_at(), summarise_if()

Here is how you could do:

# find max for each column
ozone %>%
         summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

or

ozone %>%
         summarise_at(vars(1:6), funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

edited Dec 03 '17 at 02:04

Shery

1,808
5
27
51

answered Jun 13 '14 at 20:48

rrs

9,615
4
28
38

1

**For newer R version**. If you encounter a warning message: `funs()` is deprecated as of dplyr 0.8.0. Please use a list of either functions or lambdas: The following lambda code worked for me: `ozone %>% summarise_if(is.numeric, list(~ max(., na.rm=TRUE)))` – eliasmaxil Jan 22 '21 at 20:36

score 7 · Answer 4 · answered Jun 13 '14 at 19:44

7

In response to finding the max value for each column, you could try using the apply() function:

> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
  Ozone Solar.R    Wind    Temp   Month     Day 
   41.0   313.0    20.1    74.0     5.0     9.0

answered Jun 13 '14 at 19:44

ccapizzano

1,556
13
20

Can you elaborate on what's going on here? – Al V Jun 13 '14 at 21:04
3

Of course, please enter `?apply` in your console to follow along. The function has the following arguments: `apply(X, MARGIN, FUN, ...)`. `X` refers to your array or, in this case, data frame. `MARGIN` specifies how you want the function to be applied to your data frame. For instance, `1` indicates rows while `2` is for columns. `FUN` is the function you wish to apply over your selected `MARGIN`. The above answer creates a user-defined function that finds the max value while disregarding NA values. In brief, the answer locates the max value in each column of your data frame while ignoring NAs. – ccapizzano Jun 14 '14 at 16:18

score 3 · Answer 5 · answered Jun 14 '14 at 02:27

3

Another way would be to use ?pmax

do.call('pmax', c(as.data.frame(t(ozone)),na.rm=TRUE))
#[1]  41.0 313.0  20.1  74.0   5.0   9.0

answered Jun 14 '14 at 02:27

akrun

874,273
37
540
662

2

This is neat (+1), but its worth noting that converting to "matrix" and, then, back to "data.frame" is slow and `pmax` loses its speed advantage. (`apply` is slower on a "data.frame", too, for the same reason). E.g. `DF = as.data.frame(matrix(sample(100, 1e6, T), 1e2, 1e4))` ; `microbenchmark::microbenchmark(sapply(DF, max), do.call(pmax, as.data.frame(t(DF))), apply(DF, 2, max), unlist(lapply(DF, max)), as.matrix(DF), as.data.frame(t(DF)), times = 20)`. P.S. Sorry, for the long (and partly irrelevant) comment here, but I do like `pmax` :) – alexis_laz Jun 14 '14 at 09:46

score 3 · Answer 6 · answered Sep 28 '16 at 14:23

3

There is a package matrixStats that provides some functions to do column and row summaries, see in the package vignette, but you have to convert your data.frame into a matrix.

Then you run: colMaxs(as.matrix(ozone))

answered Sep 28 '16 at 14:23

eddy85br

284
4
12

score 2 · Answer 7 · answered May 10 '15 at 20:40

2

max(may$Ozone, na.rm = TRUE)

Without $Ozone it will filter in the whole data frame, this can be learned in the swirl library.

I'm studying this course on Coursera too ~

answered May 10 '15 at 20:40

markcodd

61
5

score 1 · Answer 8 · answered Mar 10 '15 at 22:59

1

max(ozone$Ozone, na.rm = TRUE) should do the trick. Remember to include the na.rm = TRUE or else R will return NA.

answered Mar 10 '15 at 22:59

WallyTaylor

21
1

score 1 · Answer 9 · answered Jun 05 '15 at 15:32

1

Assuming that your data in data.frame called maxinozone, you can do this

max(maxinozone[1, ], na.rm = TRUE)

answered Jun 05 '15 at 15:32

imharindersingh

166
5
14

score 0 · Answer 10 · edited Oct 18 '15 at 12:03

0

Try this solution:

Oz<-subset(data, data$Month==5,select=Ozone) # select ozone  value in the month of                 
                                             #May (i.e. Month = 5)
summary(T)                                   #gives caracteristics of table( contains 1 column of Ozone) including max, min ...

edited Oct 18 '15 at 12:03

David Arenburg

91,361
17
137
196

answered Oct 18 '15 at 11:54

S.ElBahloul

65
7

How to find the highest value of a column in a data frame in R?

10 Answers10

Linked

Related