72

I have the following data frame which I called ozone:

   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9

I would like to extract the highest value from ozone, Solar.R, Wind...

Also, if possible how would I sort Solar.R or any column of this data frame in descending order

I tried

max(ozone, na.rm=T)

which gives me the highest value in the dataset.

I have also tried

max(subset(ozone,Ozone))

but got "subset" must be logical."

I can set an object to hold the subset of each column, by the following commands

ozone <- subset(ozone, Ozone >0)
max(ozone,na.rm=T) 

but it gives the same value of 334, which is the max value of the data frame, not the column.

Any help would be great, thanks.

tk421
  • 5,775
  • 6
  • 23
  • 34
Al V
  • 1,227
  • 2
  • 11
  • 15
  • 8
    `max(ozone$Ozone)` or `max(subset(ozone,select=Ozone))`. You should definitely look at some introductory R material on column indexing for data frames, which is your basic trouble. (This is a coursera question, right? https://github.com/ahawker/data-analysis-coursera/blob/master/HW1/hw1.R ) – Ben Bolker Jun 13 '14 at 19:40
  • @BenBolker Yes it is. btw how did you gray sections of your comment. – Al V Jun 13 '14 at 20:48
  • 2
    I used backticks `` (I'm sure there's formatting help around here somewhere?) – Ben Bolker Jun 13 '14 at 20:48
  • Thanks! I am following you on github, btw the name of the course is now [R Programming](https://www.coursera.org/course/rprog). It's part of the ["Data Specialization Track"](https://www.coursera.org/specialization/jhudatascience/1?utm_medium=courseDescripTop) – Al V Jun 13 '14 at 20:59
  • 2
    @BenBolker: Here's a [link to SO comment formatting](http://stackoverflow.com/editing-help#comment-formatting) - always available by clicking the "help" link beside the comment box. – jbaums Jun 14 '14 at 02:46

10 Answers10

62

Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

Get your data:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

Use colMax function on sample data:

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

To do the sorting on a single column,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

and over all columns use our colSort function,

colSort(dat, decreasing = TRUE) ## compare with '...' above
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
49

To get the max of any column you want something like:

max(ozone$Ozone, na.rm = TRUE)

To get the max of all columns, you want:

apply(ozone, 2, function(x) max(x, na.rm = TRUE))

And to sort:

ozone[order(ozone$Solar.R),]

Or to sort the other direction:

ozone[rev(order(ozone$Solar.R)),]
jbaums
  • 27,115
  • 5
  • 79
  • 119
WheresTheAnyKey
  • 848
  • 5
  • 6
13

Here's a dplyr solution:

library(dplyr)

# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))

# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))

UPDATE: summarise_each() has been deprecated in favour of a more featureful family of functions: mutate_all(), mutate_at(), mutate_if(), summarise_all(), summarise_at(), summarise_if()

Here is how you could do:

# find max for each column
ozone %>%
         summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

or

ozone %>%
         summarise_at(vars(1:6), funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)
Shery
  • 1,808
  • 5
  • 27
  • 51
rrs
  • 9,615
  • 4
  • 28
  • 38
  • 1
    **For newer R version**. If you encounter a warning message: `funs()` is deprecated as of dplyr 0.8.0. Please use a list of either functions or lambdas: The following lambda code worked for me: `ozone %>% summarise_if(is.numeric, list(~ max(., na.rm=TRUE)))` – eliasmaxil Jan 22 '21 at 20:36
7

In response to finding the max value for each column, you could try using the apply() function:

> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
  Ozone Solar.R    Wind    Temp   Month     Day 
   41.0   313.0    20.1    74.0     5.0     9.0 
ccapizzano
  • 1,556
  • 13
  • 20
  • Can you elaborate on what's going on here? – Al V Jun 13 '14 at 21:04
  • 3
    Of course, please enter `?apply` in your console to follow along. The function has the following arguments: `apply(X, MARGIN, FUN, ...)`. `X` refers to your array or, in this case, data frame. `MARGIN` specifies how you want the function to be applied to your data frame. For instance, `1` indicates rows while `2` is for columns. `FUN` is the function you wish to apply over your selected `MARGIN`. The above answer creates a user-defined function that finds the max value while disregarding NA values. In brief, the answer locates the max value in each column of your data frame while ignoring NAs. – ccapizzano Jun 14 '14 at 16:18
3

Another way would be to use ?pmax

do.call('pmax', c(as.data.frame(t(ozone)),na.rm=TRUE))
#[1]  41.0 313.0  20.1  74.0   5.0   9.0
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    This is neat (+1), but its worth noting that converting to "matrix" and, then, back to "data.frame" is slow and `pmax` loses its speed advantage. (`apply` is slower on a "data.frame", too, for the same reason). E.g. `DF = as.data.frame(matrix(sample(100, 1e6, T), 1e2, 1e4))` ; `microbenchmark::microbenchmark(sapply(DF, max), do.call(pmax, as.data.frame(t(DF))), apply(DF, 2, max), unlist(lapply(DF, max)), as.matrix(DF), as.data.frame(t(DF)), times = 20)`. P.S. Sorry, for the long (and partly irrelevant) comment here, but I do like `pmax` :) – alexis_laz Jun 14 '14 at 09:46
3

There is a package matrixStats that provides some functions to do column and row summaries, see in the package vignette, but you have to convert your data.frame into a matrix.

Then you run: colMaxs(as.matrix(ozone))

eddy85br
  • 284
  • 4
  • 12
2
max(may$Ozone, na.rm = TRUE)

Without $Ozone it will filter in the whole data frame, this can be learned in the swirl library.

I'm studying this course on Coursera too ~

markcodd
  • 61
  • 5
1

max(ozone$Ozone, na.rm = TRUE) should do the trick. Remember to include the na.rm = TRUE or else R will return NA.

1

Assuming that your data in data.frame called maxinozone, you can do this

max(maxinozone[1, ], na.rm = TRUE)
imharindersingh
  • 166
  • 5
  • 14
0

Try this solution:

Oz<-subset(data, data$Month==5,select=Ozone) # select ozone  value in the month of                 
                                             #May (i.e. Month = 5)
summary(T)                                   #gives caracteristics of table( contains 1 column of Ozone) including max, min ...
David Arenburg
  • 91,361
  • 17
  • 137
  • 196