0

I have a data frame that looks like this:

gender <- c("F", "M", "M", "M", "M")
entourage <- c("YC; AD; EL", "YC", "AD; YC", "AD", "EL")
data <- data.frame(gender, entourage)

I want to plot the number of times substrings "YC", "AD", and "EL" occur in ggplot. I also want to plot the count of "YC" given that gender is "M".

alexpghayes
  • 673
  • 5
  • 17
  • 1
    So what exactly is the question here? Or the desired output? You should provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), not just a picture of data. Clearly show the desired output for the sample input. – MrFlick Apr 21 '16 at 16:59
  • `length(grep("YC", subset(data, gender=="M")$entourage))` ? – Vincent Bonhomme Apr 21 '16 at 17:37

1 Answers1

2

Load the libraries:

library(tidyr)
library(dplyr)
library(ggplot2)

I believe the crux of the issue is getting your data into a tidy format -- or at least something more manageable for plotting. Create a tidy data.frame:

tidy.df <- data %>%
  mutate(ent = strsplit(as.character(entourage), "; ")) %>%
  unnest()

# head(tidy.df)
#   gender  entourage   ent
#   (fctr)     (fctr) (chr)
# 1      F YC; AD; EL    YC
# 2      F YC; AD; EL    AD
# 3      F YC; AD; EL    EL
# 4      M         YC    YC
# 5      M     AD; YC    AD
# 6      M     AD; YC    YC

Then you have lots of options for plotting. Look at examples for facet_wrap and facet_grid or perhaps geom_bar(position = "dodge").

ggplot(tidy.df, aes(x = ent, fill = gender)) +
  geom_bar(position = "dodge")

Plot

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116