I try to plot with ggplot2 the monthly wheat prices in France in each agricultural year in 1845-1848. I am given the following table:
year,January,February,March,April,May,June,July,August,September,October,November,December
1845,,,,,,,,20.17,20.3,21.51,22.27,22.32
1846,22.36,22.65,22.42,22.26,22.48,22.93,22.92,24,24.9,25.97,27.59,28.01
1847,30.16,33.5,37.69,37.54,37.98,33.5,28.42,23.63,22.57,22.01,20.76,20.36
1848,20.01,19.34,18.12,16.59,16.58,15.88,15.67,,,,,
I want to plot the data with lines and points the following way:
- have the months on x and the prices on y
- group by year: each year gets its own line (four lines)
- where there is no data (NA) there should be no point and no line
This task is extremely easy to solve in libreoffice calc in just a few clicks: select all table > insert chart > line > Points and line > next > data series in rows + first row as label + first column as label > finish (8 clicks).
But I can't seem to find a way to do the same using R and ggplot2.
I need to be able to solve this in R to apply further statistical analysis to the series.
I have tried the following solution:
# Reading the data
wheat <- read_csv("data/wheat.csv")
# Plotting
wheat %>%
ggplot(aes(x=wheat[0,])) +
geom_line(aes(y=as.numeric(wheat[1,]), group="year")) +
geom_point()
I would think that a code like this would produce the desired plot.
But I get the error
"Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous. Error: Aesthetics must be either length 1 or the same as the data (4): y, x".
I understand that ggplot sees a 4x13 tibble and awaits that y have the same length (4).
But I want to feed him the table rows as y values.
Thanks for any help!
EDIT
My question is not a duplicate of "Constructing a line graph using ggplot2".
Though it is the same general problem - plotting several vectors of one dataframe and for that preparing the data to be usable with ggplot - the initial data is very different: mine is historical data that must be organized chronologically, thus the need to specify the levels
by which the data will be organized on x. Plus the initial table is particular and required special treatment with gather
.
Here the whole working code for reference:
library(tidyverse)
# Reading into a tibble:
wheat <- read_csv("year,January,February,March,April,May,June,July,August,September,October,November,December
1845,,,,,,,,20.17,20.3,21.51,22.27,22.32
1846,22.36,22.65,22.42,22.26,22.48,22.93,22.92,24,24.9,25.97,27.59,28.01
1847,30.16,33.5,37.69,37.54,37.98,33.5,28.42,23.63,22.57,22.01,20.76,20.36
1848,20.01,19.34,18.12,16.59,16.58,15.88,15.67,,,,,")
# Tidying:
wheat_tidy <- wheat %>% gather(month, price, -year)
# Leveling:
wheat_tidy$month <- factor(wheat_tidy$month, levels = c("January","February","March","April","May","June","July","August","September","October","November","December"))
# Plotting:
wheat_tidy %>%
ggplot(aes(x=month, y=price, group=year, color=as.factor(year))) +
geom_line() +
geom_point()