1

Doing an assignment for school in which we use a pre-loaded dataframe (Midwest) from dplyr to manipulate data and display visualizations through shiny.

I'm getting the error "Problem with 'summarise()' input 'Illinois' because "object 'IL' not found (even though that's a variable in a column that I thought I had grouped by.

Here's some of my code at the moment.

bar_chart <- function(midwest) {
data_summary <- midwest %>%
  dplyr::group_by(state) %>%
  summarize("Illinois" = mean(IL, na.rm = TRUE),
            "Minnesota" = mean(MN, na.rm = TRUE),
            "Indiana" = mean(IN, na.rm = TRUE),
            "Ohio" = mean(OH, na.rm = TRUE),
            "Wisconsin" = mean(WN, na.rm = TRUE))
heth123
  • 31
  • 4
  • 2
    If you grouped by "state" then then is a column called "state", not "IL". You need to use column names, not values in `summarize()`. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What exactly is in the `midwest` object? – MrFlick Dec 09 '20 at 02:40
  • 1
    What you have there doesn't really make sense, you need to pass `summarize` column names, not factor levels within a column. For example `midwest %>% group_by(state) %>% summarize(poptotal = mean(poptotal))` – Mako212 Dec 09 '20 at 02:42
  • welcome to stack overflow! As @MrFlick indicated, try to always post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). That way you will get better answers to your question. – Stereo Dec 09 '20 at 11:25

1 Answers1

2

A couple things to understand here. Groups specify a level of aggregation, in this case state. That means when we summarize, we summarize to that specified level of aggregation. We have a data set with multiple states, so when we group by state, that means we'll end up with one row for each state. The result is that you don't have to write a line of code for each state like you did in your provided example.

When we summarize, we need to specify a function which we'll use to summarize (i.e. roll-up) the data, as well as a column to apply it to. In this case you're using mean, so I'll use that as well, and we'll find the mean of poptotal for each state.

Finally, while you can use recode to replace factor levels, my little example below uses a left_join and R's built in table of state names and abbreviations to add it in - a nice little trick if you had all 50 states.

library(tidyverse)
data(midwest)

stateTable <-  data.frame(state.abb, state.name)

midwest %>% group_by(state) %>% 
  summarize(poptotal = mean(poptotal)) %>% 
  left_join(. , stateTable, by = c( "state" = "state.abb"))

# A tibble: 5 x 3
  state poptotal state.name
  <chr>    <dbl> <fct>     
1 IL     112065. Illinois  
2 IN      60263. Indiana   
3 MI     111992. Michigan  
4 OH     123263. Ohio      
5 WI      67941. Wisconsin 
Mako212
  • 6,787
  • 1
  • 18
  • 37