-1

I try to plot with ggplot2 the monthly wheat prices in France in each agricultural year in 1845-1848. I am given the following table:

year,January,February,March,April,May,June,July,August,September,October,November,December
1845,,,,,,,,20.17,20.3,21.51,22.27,22.32
1846,22.36,22.65,22.42,22.26,22.48,22.93,22.92,24,24.9,25.97,27.59,28.01
1847,30.16,33.5,37.69,37.54,37.98,33.5,28.42,23.63,22.57,22.01,20.76,20.36
1848,20.01,19.34,18.12,16.59,16.58,15.88,15.67,,,,,

I want to plot the data with lines and points the following way:

  1. have the months on x and the prices on y
  2. group by year: each year gets its own line (four lines)
  3. where there is no data (NA) there should be no point and no line

This task is extremely easy to solve in libreoffice calc in just a few clicks: select all table > insert chart > line > Points and line > next > data series in rows + first row as label + first column as label > finish (8 clicks).

But I can't seem to find a way to do the same using R and ggplot2.

I need to be able to solve this in R to apply further statistical analysis to the series.

I have tried the following solution:

# Reading the data
wheat <- read_csv("data/wheat.csv")

# Plotting
wheat %>%
  ggplot(aes(x=wheat[0,])) +
  geom_line(aes(y=as.numeric(wheat[1,]), group="year")) +
  geom_point()

I would think that a code like this would produce the desired plot.

But I get the error

"Don't know how to automatically pick scale for object of type tbl_df/tbl/data.frame. Defaulting to continuous. Error: Aesthetics must be either length 1 or the same as the data (4): y, x".

I understand that ggplot sees a 4x13 tibble and awaits that y have the same length (4).

But I want to feed him the table rows as y values.

Thanks for any help!

EDIT

My question is not a duplicate of "Constructing a line graph using ggplot2".

Though it is the same general problem - plotting several vectors of one dataframe and for that preparing the data to be usable with ggplot - the initial data is very different: mine is historical data that must be organized chronologically, thus the need to specify the levels by which the data will be organized on x. Plus the initial table is particular and required special treatment with gather.

Here the whole working code for reference:

library(tidyverse)

# Reading into a tibble:
wheat <- read_csv("year,January,February,March,April,May,June,July,August,September,October,November,December
1845,,,,,,,,20.17,20.3,21.51,22.27,22.32
1846,22.36,22.65,22.42,22.26,22.48,22.93,22.92,24,24.9,25.97,27.59,28.01
1847,30.16,33.5,37.69,37.54,37.98,33.5,28.42,23.63,22.57,22.01,20.76,20.36
1848,20.01,19.34,18.12,16.59,16.58,15.88,15.67,,,,,")

# Tidying:
wheat_tidy <- wheat %>% gather(month, price, -year)

# Leveling:
wheat_tidy$month <- factor(wheat_tidy$month, levels = c("January","February","March","April","May","June","July","August","September","October","November","December"))

# Plotting:
wheat_tidy %>%
  ggplot(aes(x=month, y=price, group=year, color=as.factor(year))) +
  geom_line() +
  geom_point()
  • Related / possible duplicate: [*Stacked Bar Plot in R*](https://stackoverflow.com/q/20349929/2204410) – Jaap Aug 27 '19 at 15:28
  • You need to change your data from wide to long format. You can use `dplyr::gather` to do this: `df %>% gather(month,value,-year,factor_key = T) %>% ggplot(aes(month,value,group=factor(year),colour=factor(year))) + geom_line() + geom_point()` – kstew Aug 27 '19 at 15:59
  • Possible duplicate of [Constructing a line graph using ggplot2](https://stackoverflow.com/questions/19011959/constructing-a-line-graph-using-ggplot2) – kstew Aug 27 '19 at 16:42

1 Answers1

0

Three problems here:

1) Your data is not tidy, meaning, the month is not a variable. It's just a column name. You can use gather to help with that;

2) In your first aes() statement you need to define both x and y;

3) Just using group to define the year doesn't help much; you still need to define how each value in the group will be different -- for example, using color to make each year line a different color.

This code worked for me (EDIT: similar to kstew's comment above, which was posted while I was writing my answer):

library(tidyverse) #includes ggplot

wheat <-read_delim("year,January,February,March,April,May,June,July,August,September,October,November,December\n1845,,,,,,,,20.17,20.3,21.51,22.27,22.32\n1846,22.36,22.65,22.42,22.26,22.48,22.93,22.92,24,24.9,25.97,27.59,28.01\n1847,30.16,33.5,37.69,37.54,37.98,33.5,28.42,23.63,22.57,22.01,20.76,20.36\n1848,20.01,19.34,18.12,16.59,16.58,15.88,15.67,,,,,", delim = ",")

df <- wheat %>%
  gather(theMonth, wheatValue, -year)

plot <- ggplot(df, aes(x = theMonth, y = wheatValue, group = as.factor(year), color = as.factor(year))) +
  geom_line()
mmyoung77
  • 1,343
  • 3
  • 14
  • 22
  • Great, @mmyoung77! I was able to get the same plot after tidying the data in excel by hand. Now you have shown me how to use gather to do the same, thank you! But the resulting plot is wrong: the x axis should have the months in chronological order, here they are in some strange order and consequently the lines are wrong. Any clue how to correct the order of the x axis? – user2696939 Aug 27 '19 at 19:47
  • OK, it seems that https://stackoverflow.com/questions/20041136/avoid-ggplot-sorting-the-x-axis-while-plotting-geom-bar has the answer to my ordering question. ggplot2 orders by default alphanumerically, this can be changed. – user2696939 Aug 27 '19 at 19:58
  • `as.factor` probably put the months in a wrong order due to the missing data. – mmyoung77 Aug 27 '19 at 19:58
  • I indeed needed to explicitly affect the month names as levels using `wheat_tidy$month <- factor(wheat_tidy$month, levels = c("January","February","March","April","May","June","July","August","September","October","November","December"))` – user2696939 Aug 27 '19 at 20:08
  • Thanks, @mmyoung77, for your working solution! And thanks as well to the two commenters for their suggestions. Greatly appreciated! – user2696939 Aug 27 '19 at 20:12