I am trying to convert a categorical data frame, with 49 variables (airport station codes) and 41,814 observations into a table and stacked bar chart(if possible), by dividing them into 4 groups, based on their frequency.
After converting the data into a data frame, I cannot seem to get anything to work. My work up to this point has been:
corp = Corpus(VectorSource((OPSLOG2016$Base)))
corp = tm_map(corp, PlainTextDocument)
corp = tm_map(corp, tolower)
corp = tm_map(corp, removePunctuation)
stopwords("english")[1:100]
corp = tm_map(corp, removeWords,c(stopwords('english')))
corp <- tm_map(corp,stripWhitespace)
corp = tm_map(corp, PlainTextDocument)
corp <- tm_map(corp, stemDocument, language="english")
freq = DocumentTermMatrix(corp)
findFreqTerms(freq, lowfreq = 25)
sparse = removeSparseTerms(freq, 0.999)
freqSparse = as.data.frame(as.matrix(sparse))
freqSplit = split(freqSparse,4)
geom_bar(mapping = NULL, data = freqSparse, stat = "count", position =
"stack", width = NULL, binwidth = NULL, na.rm = FALSE,
show.legend = TRUE, inherit.aes = TRUE)
An example of some of the data I am working with.
yqt yqu yul yvr ywg yxe yxj yxs yxt yxu yxx yyc yyg yyj yyt yyz yzf
1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
I'm not yet familiar with many of the different packages in R, or their different features, So if possible, I'd love to be pointed in the right direction.