How to plot two variable (same unit %) from two columns in ggplot2?

Question

I have a csv table in which there are three columns I would like to plot out as line graph using ggplot2 in R. The variable on x axis will reference the data in column "DATE_Out", the two variables on y axis will reference column "Percent_In" and "Percent_Out" respectively. Note that "Percent_In" and "Percent_Out" are completely two columns not one column's data with different types to group. Table Data Example

Could anyone give me some hints with the R code?

Consider converting DATE_Out to the POSIXct time format, then melting the whole thing with `reshape2`, and plotting a line graph using `geom_path()` or `geom_line()` with setting the `colour` to the group column in the melted dataframe. — 12b345b6b78, Oct 12 '18 at 20:36
Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) — Tung, Oct 12 '18 at 20:50

score 1 · Accepted Answer · answered Oct 12 '18 at 20:47

1

library(ggplot2)
library(reshape2)

tbl <- read.csv('table.csv')

tbl$DATE_Out <- as.Date(tbl$DATE_Out, format = '%m/%d/%Y')
tbl <- melt(tbl, id.vars = 'DATE_Out')

plt <- ggplot(data = tbl, aes(x = DATE_Out, y = value))
plt <- plt + geom_path(aes(colour=tbl$variable))
plt + theme_minimal() + theme(legend.title=element_blank())

answered Oct 12 '18 at 20:47

12b345b6b78

995
5
16

Hi, after applying your code, it returned me a message like this: Error: Aesthetics must be either length 1 or the same as the data (2496): colour, x, y. Do you know what caused this? – DavisTianGeo123 Oct 12 '18 at 22:11
It's very hard to say without seeing your melted dataframe, including its column classes, as well as without seeing your plotting code. I'd venture to guess that at some point during the renaming of the columns something got mixed up (it does seem like you renamed them). Keep in mind that the plotting part of the code refers to colnames in the melted dataframe (since that's what's fed into `ggplot()`), not the colnames in the original dataframe. – 12b345b6b78 Oct 12 '18 at 23:54
The pre-melted dataframe I used had 3 columns (`Date`, `numeric`, `numeric`) – 12b345b6b78 Oct 12 '18 at 23:55
ggplot(data = tbl, aes(x = DATE_Out, y = value, group=variable))---need to be something like this. Thanks – DavisTianGeo123 Oct 13 '18 at 02:35
How, then, would you propose for the scaling of the two variables on y axis? I found that if you plot all the data (almost 616 rows after melting), the entire graph became extremely compressed and ugly without proper y scaling. – DavisTianGeo123 Oct 13 '18 at 03:30
If you didn't override the range of a continuous scale in ggplot2 with `+ ylim(number1, number2)` or something equivalent (there are actually multiple ways to make this happen), ggplot2 will set the range to be between the largest and the smallest datapoint in the datset. If you're not happy with how it looks, it sounds like it's just what the data look like. You have options though. You can opt for a taller graph. You can transform the data (probably the least sensible option in this case). Sometimes, you may also choose to simply omit the datapoints (e.g. by using `ylim()`) – 12b345b6b78 Oct 13 '18 at 04:05
The thing is, when I plotted a single variable on y axis at a time, I did not set up anything for scaling y either, the plot result is very good. But when I plotted two of them together in one plot, the result just indicated I need to do something to change it. Do you know why? – DavisTianGeo123 Oct 13 '18 at 04:48
If the variables are non-identical distributions, they will look different (i.e. have different minima and maxima), which could explain the difference you're seeing. Again, it's extremely difficult to advise on such matters in the absence of an actual visualization – 12b345b6b78 Oct 13 '18 at 06:15

score 0 · Answer 2 · answered Oct 12 '18 at 21:16

The tidyr package offers the gather function which is designed to do just this sort of thing.

library(dplyr)
library(tidyr)

View(iris)

iris %>%
  gather('Measurment','Value',Sepal.Length,Sepal.Width) %>%
  View

I prefer tidyr to reshape because, to me, the functionality is clearer and the functions are more versatile. For example, rather than having to specify all the variables as i.d. variables in melt, I can just specify the variables I wish to gather together. In most of my datasets that is a smaller, cleaner way to code. (See the help page for dplyr::select for more details on ways to select which columns are used)

How to plot two variable (same unit %) from two columns in ggplot2?

2 Answers2