0

I'm trying to visualize a cross-tabulation on RStudio using ggplot2. I've been able to create plots in the past, and have a cross-tabulation done as well, but can't crack this. Can anyone help?

Here's my code for an x-tab:

library(dplyr)
data_dan %>%
  group_by(Sex, Segment) %>%
  count(variant) %>%
  mutate(prop = prop.table(n))

and here's what I've got for creating a plot:

#doing a plot
variance_art_new.plot = ggplot(data_dan,  aes(Segment, fill=variant)) +
  geom_bar(position="fill")+
  theme_classic()+
  scale_fill_manual(values = c("#fc8d59", "#ffffbf", "#99d594"))
variance_art_new.plot

Here's a sample of the data I'm operating with:

   Word  Segment  variant  Position     Sex
1  LIKE       K       R       End      Female
2  LITE       T       S       End      Male
3 CRACK       K       R       End      Female
4  LIKE       K       R       End      Male
5  LIPE       P       G       End      Female
6  WALK       K       G       End      Female

My aim is to have the independent variables of 'Sex', 'Segment' plotted on a boxplot against the dependent variable 'variant'. I included the first code to show that I can create a table to show this cross-tabulation and the second bit is what I normally do for running a box plot for just one independent variable.

  • So what exactly is the desired output? You need to include a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run the code and see what it's doing. Are you storing the results of your dplyr mutate commands somewhere? Right now your two chucks of code don't seem related. – MrFlick Nov 21 '17 at 20:18
  • I've edited the main point to hopefully make this clearer, I haven't been storing the results of the dplyr stuff - all I need to do is create the x-tab plot from the data set. I just included that to show that I've tried to crack it and have been able to do so with a table, but not a graph. – MadDanWithABox Nov 21 '17 at 20:33
  • It is not clear what output you are looking for. I *think* you want a barplot (**not** a boxplot), but it is unclear why you are unhappy with the output you have. – Mark Peterson Nov 21 '17 at 20:51
  • So at the moment, my _bar_ plot (you were correct about this, my bad) only works for one variable. I want to add a second variable to it but I don't know how to. I feel like it should be simple, but I don't know what to do. – MadDanWithABox Nov 21 '17 at 21:00
  • I ran a similar plot with a builtin dataset that seems similar to your request. It may help to show your current plot and explain why it doesn't do what you want. Even better would be to use reproducible data so that others can modify it directly. For example, `mtcars %>% ggplot(aes(x = cyl, fill = factor(gear))) + geom_bar(position = "fill")` does what I *think* you are requesting in the question, and it is unclear what about that plot is insufficient for your needs. – Mark Peterson Nov 21 '17 at 21:04
  • So the plot you have there shows the variance of gears to cylinders, all I would want to do is create a plot that shows. say the variance between gears and cylinders, as well as transmission (auto vs manual) on one plot. I can't really help much with providing more code as I'm very new to this I'm afraid, but I really appreciate you sitting me down and trying to explain this :) – MadDanWithABox Nov 21 '17 at 21:20
  • I'm glad we got this through for you -- always nice when you can get over the hump. In the future, try to include (or use) reproducible data because it makes it a lot easier to help you. It is also often worthwhile to sketch out what you want the plot to look like (e.g., in paint or on paper) to make it more clear. I'd also encourage you to be a bit more careful with language (e.g., box vs bar plot and "variance", which is not the word I think you want here) to make sure you are communicating your desires clearly. – Mark Peterson Nov 21 '17 at 21:46
  • I sure will, again, many thanks - I'm sure that with time I'll get to grips more with the terminology but your point about sketching a thing out is a good one, I'll keep that in mind - thanks once again! – MadDanWithABox Nov 21 '17 at 22:06

1 Answers1

2

I'm still not sure if this gets all the way to what you are asking, but if you are asking for counts (or portions) within two separate variable, you can use facet_wrap to separate the two groups.

(Note, all of these are run with theme_set(theme_bw()) because I prefer it for this type of plot.)

Working with the builtin dataset mtcars you can get counts with:

mtcars %>%
  ggplot(aes(x = factor(cyl), fill = factor(gear))) +
  geom_bar() +
  facet_wrap(~vs)

enter image description here

Or with the sorting reversed with:

mtcars %>%
  ggplot(aes(x = factor(vs), fill = factor(gear))) +
  geom_bar() +
  facet_wrap(~cyl, labeller = label_both)

enter image description here

You can also plot the within-group distribution by using position = "fill"

mtcars %>%
  ggplot(aes(x = factor(vs), fill = factor(gear))) +
  geom_bar(position = "fill") +
  facet_wrap(~cyl, labeller = label_both) +
  scale_y_continuous(name = "Within group Percentage"
                     , labels = scales::percent)

enter image description here

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48