1

I'm trying to learn R by doing some exploratory data analysis on this data set: https://www.cdc.gov/brfss/. The idea is to make use of both dplyr and ggplot2.

I have the following code:

brfss2013 %>%
  filter(!is.na(menthlth), !is.na(veteran3)) %>%
  group_by(menthlth) %>%
  summarise(vcount = sum(veteran3 == "Yes"), nvcount = sum(veteran3 == "No"))

I'd like to create a side-by-side bar chart with the x axis showing the numbers from 0 to 30 (menthlth) and the y-axis showing vcount on the left and nvcount on the right (for each value of menthlth). I know that I can chain the last line of my code to a ggplot line, but I don't understand how I can create a side-by-side chart.

I tried to assign the output of summarise to a variable, so that I could use the melt command, or something similar, but that resulted in an error ("object 'veteran3' not found"). Is there a simpler way to plot two variables side by side directly?

Thank you for your help, and sorry if I'm missing something obvious.

EDIT: I've now assigned the result to a variable a, and dput(head(a, 10)) gives

structure(list(menthlth = 0:9, vcount = c(46931L, 1221L, 1861L, 1083L, 545L, 1323L, 197L, 466L, 105L, 22L), nvcount = c(287025L, 13964L, 21633L, 12505L, 6111L, 15312L, 1664L, 5882L, 1139L, 175L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame" ))
tjebo
  • 21,977
  • 7
  • 58
  • 94
HelloWorld4444
  • 125
  • 1
  • 9
  • Hi. It would help if you could `dput(head(your data, 10))`, this would help massively to work with your data. – tjebo Sep 13 '18 at 16:54
  • just as you would use your pipe normally. `... %>% dput(head(., 10))` or assign a name to it. might be better anyways – tjebo Sep 13 '18 at 17:01
  • 1
    You need to `melt`/`gather` your data: see this [possible duplicate](https://stackoverflow.com/questions/46916042/simple-bar-plot-with-multiple-variables-in-r-similar-to-excel); or [this one](https://stackoverflow.com/questions/47980079/using-multiple-variables-in-geom-bar-with-ggplot-at-same-x-r) – pogibas Sep 13 '18 at 17:02
  • @Tjebo when I try assigning 'summarise' to a variable ( a <- summarise(...) ) I get an error, and when I try piping I get this: "Error in dput(., head(., 10)): 'file' must be a character string or connection" – HelloWorld4444 Sep 13 '18 at 17:09
  • 1
    just use your entire code which you showed us above and put `a<-` before it – tjebo Sep 13 '18 at 17:19
  • do you need a cluster chart for `vcount`, and `nvcount` like https://stackoverflow.com/questions/52023975/how-to-create-cluster-column-chart-with-r/52024038#52024038 – Sal-laS Sep 13 '18 at 17:58
  • 1
    It's kind of confusing, but to assign the result of a piping operation, you put the assignment operator **before** the entire chain. So: `a <- df %>% filter(...) %>% mutate(...)` would run the whole pipe and assign the final result to the variable `a` – divibisan Sep 13 '18 at 18:02
  • Thanks @divibisan. That's good to know. I realised the error message was the same one from dput, since I'd kept that line in the code. – HelloWorld4444 Sep 13 '18 at 18:14
  • @SalmanLashkarara Yes, that is what I'm trying to get - just not sure how to get my data into the right format/structure. – HelloWorld4444 Sep 13 '18 at 18:19
  • @Tjebo I now have the output of dput. Sorry it took so long. – HelloWorld4444 Sep 13 '18 at 18:19

2 Answers2

1
library(tidyverse)
# dat_ <- structure(list(menthlth = 0:9, vcount = c(46931L, 1221L, 1861L, 1083L, 545L, 1323L, 197L, 466L, 105L, 22L), nvcount = c(287025L, 13964L, 21633L, 12505L, 6111L, 15312L, 1664L, 5882L, 1139L, 175L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame" ))

plot_dat <- dat_ %>% gather(group,y, 2:3) # reshape your data frame for plotting - 

ggplot()+ 
  geom_col(data = plot_dat, 
           aes(as.character(menthlth), y, fill = group),
           position = position_dodge())

You should make your x discrete (as.character(menthlth)). And use position = position_dodge(), because the columns are otherwise stacked (try it out to omit it)

tjebo
  • 21,977
  • 7
  • 58
  • 94
1

I do not have access to your data, but based on your example, i made the below dataset:

dt<-data.frame(menthlth=sample( c(1:10),10),
               vcount=sample( c(1:1000),10),
               nvcount=sample( c(1:1000),10))

You need to first revise about the structure of your dataset:

NewDT<- data.frame(menthlth= dt$menthlth,
                  category=c(rep("vcount",length(dt$menthlth)),rep("nvcount",length(dt$menthlth) )),
                    value=c(dt$vcount,dt$nvcount)) 

and them make the barchart:

library(ggplot2)

ggplot(data=NewDT, aes(x=menthlth, y=value, fill=category)) +
  geom_bar(stat="identity", position=position_dodge())

result is:

enter image description here

Sal-laS
  • 11,016
  • 25
  • 99
  • 169