So I have a variable "Body" that has observations which are sentences. I have another variable "Postcategory" which observations are either "Low-quality post" or "High-quality post". I have counted the words of each observation in variable "Body" and now I want to make a boxplot where one can see the median of words in "Body"'s observations for both Low-quality and High-quality post.
As I counted the number of words in each sentence of "Body", I used the following code
lengths(strsplit(data$Body, '\\S+'))
word <- lengths(strsplit(data$Body, '\\S+'))
I then assigned the result to value "word". I used the following code in trying to create the boxplot with ggplot2.
geom_boxplot(outlier.colour="black", outlier.shape=16,
outlier.size=2, notch=FALSE)
ggplot(data, aes(x=Postcategory, y=word)) + geom_boxplot()
I know it's wrong but I can't seem to find solution for what to do get the result I want.
I also made a quick sketch of how I would like the final box plot to look like (The values are not correct)