1

How do I group columns of a dataframe for each dataframe within a list of dataframes and apply group_by() and summarise()?

Currently getting the error as show in the title. Here is what I have:

enter image description here

I would like to group_by() and then summarise() as tried here:

trial <- dataset %>% group_by(Year) %>% summarise(Mean_Max_Temp = mean(Max.Temp), Mean_Min_Temp = mean(Min.temp)
+                                                             ,Monthly_Precip = sum(Precipitation))

Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "list"

Vijay Ramesh
  • 191
  • 1
  • 2
  • 20
  • It looks like `dataset` is a list of data frames, not a data frame. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with, including a sample of data. – camille Aug 21 '19 at 18:17
  • Because dataset contains list and these list contain Year, so you should apply it to a list inside dataset – Chelmy88 Aug 21 '19 at 18:21
  • @Chelmy88 How? Please elaborate – Vijay Ramesh Aug 21 '19 at 18:21
  • Also @camille - Do you think it's better to keep it as a list of lists or as multiple dataframes in R, which can then be used to run tidy functions? – Vijay Ramesh Aug 21 '19 at 18:28
  • e.g `trial <- dataset[["data_10.25_76.75"]] %>% ..` – Chelmy88 Aug 21 '19 at 18:30
  • If the issue is regarding data types (it is) but we don't have data that we know will reproduce the issue, we're just guessing. You're calling `group_by` on the wrong type of object (a list, not a data frame). The link I posted has suggestions on making sample data – camille Aug 21 '19 at 18:32
  • @camille I see. Thanks! Will try to create some. But in general, do you recommend keeping a list of lists, or rather 20 odd dataframes in your global env. to which you would apply a bunch of tidy functions? – Vijay Ramesh Aug 21 '19 at 18:33
  • For the question about whether to work with it as a list of data frames: from a data management perspective probably best to keep it that way, but it also will depend on what exactly you're trying to do and what type of output you need – camille Aug 21 '19 at 18:33
  • Hmm, Ideally would like to run some visualizations and EDA on unique columns across different dataframes within the list. Thought it might be easier this way as well. – Vijay Ramesh Aug 21 '19 at 18:34
  • @camille Tried to keep it as a list of lists and I think I got it to work :-) – Vijay Ramesh Aug 21 '19 at 19:19
  • You will have to work on elements of `dataset`, e.g. `dataset[[1]] %>%...`. If you need to work on several elements, you can try `lapply` or `sapply`. – Roman Luštrik Aug 22 '19 at 09:15

1 Answers1

1

As the data frames appear to all have the same structure, you could bind them into one, then also group by the ID variable. I'm guessing your list names are just the longitude and latitude where your meteorological data was collected, so this shouldn't be a problem.

dataset %>% 
  bind_rows(.id = "location") %>% 
  group_by(frame_name, year) %>% 
  summarize(Mean_Max_Temp = mean(Max.Temp), Mean_Min_Temp = mean(Min.temp), 
            Monthly_Precip = sum(Precipitation))

Alternatively, you could use map() to apply the same function to each element of your list.

dataset %>% 
  map(~summarise(group_by(., Year),
                 Mean_Max_Temp = mean(Max.Temp), 
                 Mean_Min_Temp = mean(Min.temp),
                 Monthly_Precip = sum(Precipitation)))
shs
  • 3,683
  • 1
  • 6
  • 34
  • Thanks! What is `location` here? – Vijay Ramesh Aug 21 '19 at 20:04
  • It's an ID variable created from the names of the list element. You can call it what ever you want. I called it that because I thought those were coordinates in your list names. – shs Aug 21 '19 at 20:46