R: plotting counts of substrings

Question

I have a data frame that looks like this:

gender <- c("F", "M", "M", "M", "M")
entourage <- c("YC; AD; EL", "YC", "AD; YC", "AD", "EL")
data <- data.frame(gender, entourage)

I want to plot the number of times substrings "YC", "AD", and "EL" occur in ggplot. I also want to plot the count of "YC" given that gender is "M".

So what exactly is the question here? Or the desired output? You should provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), not just a picture of data. Clearly show the desired output for the sample input. — MrFlick, Apr 21 '16 at 16:59

score 2 · Accepted Answer · answered Apr 21 '16 at 17:41

Load the libraries:

library(tidyr)
library(dplyr)
library(ggplot2)

I believe the crux of the issue is getting your data into a tidy format -- or at least something more manageable for plotting. Create a tidy data.frame:

tidy.df <- data %>%
  mutate(ent = strsplit(as.character(entourage), "; ")) %>%
  unnest()

# head(tidy.df)
#   gender  entourage   ent
#   (fctr)     (fctr) (chr)
# 1      F YC; AD; EL    YC
# 2      F YC; AD; EL    AD
# 3      F YC; AD; EL    EL
# 4      M         YC    YC
# 5      M     AD; YC    AD
# 6      M     AD; YC    YC

Then you have lots of options for plotting. Look at examples for facet_wrap and facet_grid or perhaps geom_bar(position = "dodge").

ggplot(tidy.df, aes(x = ent, fill = gender)) +
  geom_bar(position = "dodge")

R: plotting counts of substrings

1 Answers1