0

So, looking at the ggplot2 examples online, it seems that all the data used are structured as different observations of data (as rows) and consistent attributes among those for columns.

e.g.

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Now, I'm trying to graph data that I personally organized structured as dates for rows, and presidential candidates for columns. The observations being their overall percentage of votes in polls.

The data looks like this

load(url("http://www.clutchmemes.com/Random/GOPArray.RData"))
require(plyr)

averagedGOPPoll <- as.data.frame(aaply(allGOP, 1:2, mean, na.rm=TRUE))
averagedGOPPoll <- cbind(dates=as.Date(rownames(averagedGOPPoll)), veragedGOPPoll)

head(averagedGOPPoll)

                dates Trump Carson Rubio Cruz Bush Rand.Paul Christie Fiorina
2015-01-01 2015-01-01   NaN    NaN   NaN  NaN  NaN       NaN      NaN     NaN
2015-01-02 2015-01-02   NaN    NaN   NaN  NaN  NaN       NaN      NaN     NaN
2015-01-03 2015-01-03   NaN    NaN   NaN  NaN  NaN       NaN      NaN     NaN
2015-01-04 2015-01-04   NaN    NaN   NaN  NaN  NaN       NaN      NaN     NaN
2015-01-05 2015-01-05   NaN    NaN   NaN  NaN  NaN       NaN      NaN     NaN
2015-01-06 2015-01-06   NaN    NaN   NaN  NaN  NaN       NaN      NaN     NaN

How could I go about graphing this in a linear fashion with the dates on the y-axis, and different line for each candidate, and the total percentage votes of the x-axis.

Interested in creating something like this? Taken from Huffington Post

Something like this: (Taken from Huffington Post) My question is unique because I need a very robust way to organize such structured data, because I will be combining this with data from polls comparing Democratic Polls, as well as Google Trends data. Since boiling it down to where I have it is relatively easy, I needed a systematic way to get this down a "melted" position. So yes, my question is unique, and the answers were also uniquely fitting to my question. Not sure why this is repetative.

  • 1
    Maybe transpose the data? Data are usually organizes as one row for each observation, not one column. – Heroka Dec 19 '15 at 14:57
  • how would i go about transposing this data in usable form? thank you. I took it that i would have to transform the data in some way, but i was wondering if there was an easier solution through 1) maybe some ggplot hacks that would serve the purpose, or 2) another packages that could do this easily. – Gilgamesh Skytrooper Dec 19 '15 at 15:12
  • As per @Romains answer, you need to melt your data. Sorry, transpose wasn't the right word. – Heroka Dec 19 '15 at 15:14
  • when I use melting, i get a missing value warning flag CGI/CGWindow/CGContext errors. But it seems to graph correctly. Should be worried? – Gilgamesh Skytrooper Dec 19 '15 at 15:30
  • Well it really is a kind of transposing, and it is really one of the most common problems with creating line graphs. In a spreadsheet the way you have the data organized is very common and would make a fine line graph. However the spreadsheet would assume equal spacing which may not be valid. This answer will help you http://stackoverflow.com/questions/19921842/plotting-multiple-time-series-on-the-same-plot-using-ggplot – Elin Dec 19 '15 at 15:31
  • @GilgameshSkytrooper, The link to the data appears to be broken. The question and answer would be more useful to others if you fix the link, or include the data in some other way. – bdemarest Dec 20 '15 at 17:10

1 Answers1

3

You need to "melt" your data frame. You can use the package reshape2 for instance:

### load pakages
library(plyr)
library(reshape2)
library(ggplot2)

### load data
load(url("http://www.clutchmemes.com/Random/GOPArray.RData"))
### convert to data frame
averagedGOPPoll <- as.data.frame(aaply(allGOP, 1:2, mean, na.rm=TRUE))
### add date column
averagedGOPPoll$date = as.Date(rownames(averagedGOPPoll))
### reshape data frame
averagedGOPPoll.melt <- melt(averagedGOPPoll,id=c('date'))
names(averagedGOPPoll.melt) = c('date','candidate','percentage')   
### plot
ggplot(averagedGOPPoll.melt,aes(x=date,y=percentage,colour=candidate))+
  geom_line()

Other option with, I think, a newer library called tidyr:

### load pakages
library(plyr)
library(tidyr)
library(ggplot2)

### load data
load(url("http://www.clutchmemes.com/Random/GOPArray.RData"))
### convert to data frame
averagedGOPPoll <- as.data.frame(aaply(allGOP, 1:2, mean, na.rm=TRUE))
### add date column
averagedGOPPoll$date = as.Date(rownames(averagedGOPPoll))
### reshape data frame
averagedGOPPoll.melt <- gather(averagedGOPPoll,candidate,percentage,-date)
### plot
ggplot(averagedGOPPoll.melt,aes(x=date,y=percentage,colour=candidate))+
  geom_line()
Romain
  • 741
  • 4
  • 16
  • 2
    Nice. Just FYI, you can set the names of columns inside 'melt', for instance by using `variable.name="candidate"` and `value.name="percentage"`. – Heroka Dec 19 '15 at 15:15