0

I'm trying to use sapply to take each item in a list (e.g. "Golf","Malibu","Corvette") and create a new list with the highest value in the dataframe that list was split from (e.g. cars$sale_price). I'm trying to use an anonymous function to do so, but I can't get that function to work.

The basic issue here is that I'm not very good at writing functions.

First, I took the original dataframe cars and used split to create a list of unique car names - I called this car_names.

Now, I'm trying to create a new list, using sapply, of the highest sale price of each type of car in the list. I'm sure I'm starting the thing correctly ...

price_list <- sapply(car_names, 

... but I can't for the life of me get an anonymous function to simply apply max to all instances of each car name in cars$sale price.

I've tried a bunch of stuff, all of which has returned an error. Here's an example:

price_list <- sapply(car_names, function(x) {
    max(cars$saleprice[x])
})

Which returns:

Error in h115$nominate_dim1[x] : invalid subscript type 'list'

I'm sure this is trivially simply for even moderate experienced programmers, but I'm ... not one of those! I suspect that I'm pointing to something incorrectly, but I can't get past it. Any ideas?


Edit: Here's a reproducible example.

First, the "source" dataframe:

cars1 <- data.frame("car_names" = c("Corvette", "Corvette", "Corvette", "Golf", "Golf", "Golf", "Malibu", "Malibu", "Malibu"),"saleprice" = c(32000,45000,72000,7500,16000,22000,33000,21000,26500))

Next, splitting the df by car_names:

cars1_split <- split(cars1, cars1$car_names)

Now, attempting to pass max to sapply and getting an error:

maxes <- sapply(cars1_split, function(x){
  max(cars1$saleprice[x])
})

Hopefully this give you guys something to work with!

logjammin
  • 1,121
  • 6
  • 21
  • 1
    I think the problem is that each element in 'car_names' is a list. If x is a list, max(cars$saleprice[x]) would result in an error. I think that's why you are getting an error, but won't know for sure unless you post a reproducible example. – Dave Rosenman Jan 11 '19 at 18:11
  • 1
    You should share a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and show expected output. – markus Jan 11 '19 at 18:55
  • You guys are right, @markus and @David-Rosenman! Thanks for the feedback. I have a busy hour or two but will share a simple reproducible example shortly. – logjammin Jan 11 '19 at 19:04

1 Answers1

2

You have a few options here, let's start with aggregate - not what you asked for but I want to keep your attention high ;)

aggregate(saleprice ~ car_names, cars1, max)
#  car_names saleprice
#1  Corvette     72000
#2      Golf     22000
#3    Malibu     33000

Returns a data.frame (which you can easily split if you need a list)

aggregate is similar to tapply coming next

tapply(cars1$saleprice, cars1$car_names, FUN = max)
#Corvette     Golf   Malibu 
#   72000    22000    33000

Or try by and which.max

by(cars1, cars1$car_names, FUN = function(x) x[which.max(x$saleprice), ])
#cars1$car_names: Corvette
#  car_names saleprice
#3  Corvette     72000
#-------------------------------
#cars1$car_names: Golf
#  car_names saleprice
#6      Golf     22000
#-------------------------------
#cars1$car_names: Malibu
#  car_names saleprice
#7    Malibu     33000

Finally, you can use also lapply and split (for which by is somewhat shorthand)

lapply(split(cars1, cars1$car_names), function(x) x[which.max(x$saleprice), ])
#$Corvette
#  car_names saleprice
#3  Corvette     72000

#$Golf
#  car_names saleprice
#6      Golf     22000

#$Malibu
#  car_names saleprice
#7    Malibu     33000
markus
  • 25,843
  • 5
  • 39
  • 58
  • Man, this is great. ````aggregate```` is so much cleaner! – logjammin Jan 11 '19 at 21:14
  • Followup: if I wanted to do ````lapply```` , as in your last example, but now I wanted to calculate mean sales prices instead of maxes, how would I do that? There isn't a which.mean, as far as I can tell, and replacing ````which.max```` with ````mean```` gives me a messed-up matrix. – logjammin Jan 11 '19 at 21:15
  • 1
    @logjammin Sorry for the delay, you would need to do `lapply(split(cars1, cars1$car_names), function(x) mean(x$saleprice))`. But you are better off using `aggregate` here. – markus Jan 12 '19 at 11:10