1

My objective is to take a subset of a data log that recorded airport station codes over time.

I am trying to plot a frequency table based on the number of times a station code is entered, then I would like to build a stacked bar chart, using the 'fill' function. Additionally, I am trying to divide these bases into 4 even groups.

The subset of the data looks like this:

 OPSLOG2016$Base <- c("yyc", "yyc", "ylw", "yvr", "lax", "hnl", "yvr", "yow", "yyz","yyz", "lga", "yyz", "yyz", "YYZ", "yow", "YYC", "YYZ", NA, "hux","yvr", ... <truncated>

Some frequencies of some bases:

#List of 49
$ bos: num 134   
$ cun: num 205
$ fll: num 114
$ hnl: num 95
$ las: num 288
$ lax: num 218
$ lga: num 456
$ lgw: num 169
$ mbj: num 71
$ mco: num 223
$ ogg: num 99

My code up to this point:

corpus = Corpus(VectorSource(OPSLOG2016$Base))
corpus = tm_map(corpus, PlainTextDocument)
basefreq = DocumentTermMatrix(corpus)
sparseBase = removeSparseTerms(basefreq, 0.999)
dfBase = as.data.frame(as.matrix(sparseBase))
qplot(dfBase, y = scale(dfBase,center = TRUE, scale = frequency())
      **#Error: ggplot2 doesn't know how to deal with data of class list Error during wrapup: cannot open the connection**


dfVecSum = lapply(dfBase, sum)
   plot(dfVecSum)
  **#Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y Error during wrapup: cannot open the connection**
ggplot(dfVecSum, aes(x = dfVecSum, y = Frequency, fill = fill)) + 
  geom_bar(position = "fill")
  **#Error: ggplot2 doesn't know how to deal with data of class list Error during wrapup: cannot open the connection**

It's likely obvious that I am new to this, and am committing many errors. But I'm hoping to be put in the right direction, as I can't seem to get any of this to work on my own.

Michael
  • 21
  • 2
  • 1
    Hi! Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – David Heckmann Jul 24 '17 at 23:30
  • You need to pare this down to something more comprehensible if yoiuwant help – user101089 Jul 25 '17 at 01:03
  • 1
    Your dataset is not a dataframe, but a list. Hence the error: `ggplot2 doesn't know how to deal with data of class list`. It may be as simple as first doing `mydf <- as.list(mylist)`, but a shorter example would help! – Remko Duursma Jul 25 '17 at 01:50
  • Ok, I believe I have gotten rid of some unnecessary info in my original question. I have also read the thread on creating a reproducible answer. Though I am a bit thick, so I am a little lost on the best way to present a reproducible example. – Michael Jul 25 '17 at 06:07

1 Answers1

0

It looks like you are starting out with a list. Maybe this can help you in the right direction:

library(ggplot2)

yourlist = list(bos= 134,cun=205,fll= 114,hnl =  95)
df = as.data.frame(do.call(rbind,yourlist))
df$name = rownames(df)
colnames(df)[colnames(df)=="V1"] = "total"

ggplot(df, aes(x = name, y = total))+ geom_bar(stat="identity")

enter image description here

Florian
  • 24,425
  • 4
  • 49
  • 80
  • Thanks, Florian, this has helped me a lot. The thing I am still hung up on is that I would like split all of the stations into 4 equal (or as close to) buckets, based on their frequencies. Ideally, to visualize them into a stacked bar chart. Following your advice, I am currently at: ggplot(opsDF, aes(x = name, y = total, label = total))+ geom_bar(stat="identity") + geom_text(size = 5, vjust = 0, color = "blue", nudge_x = 0, nudge_y = 0) – Michael Jul 25 '17 at 19:33