1

The data frame currently looks like this:

EDIT: structure

library(data.table)
library(dplyr)
library(tibble)

But I get the following error: "Each group consists of only one observation".

If so, how can I get a line graph that plots each columns value by month?

Also, I am not sure how to select more than one region in the ggplot aes() bit. I tried using c() to no avail. Any help and newbie-friendly advice would be greatly appreciated!

Lactuca
  • 115
  • 2
  • 8
  • To help us to help you could you please make your issue reproducible by sharing a sample of your **data**, the **code** you tried and the **packages** you used? See [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) Please do not use str(), head() or a screenshot to post your data. Simply type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. – stefan Jan 04 '21 at 10:57
  • BTW: At least for the code you showed try with adding the `group=1` aesthetic. – stefan Jan 04 '21 at 10:58
  • Thanks for the heads up, I didn't know about that dput command. I've just added what you've asked, trimming the data frame to a 3x3 table. Hope it works! p.s. I tried adding group = 1 but I get the same error message – Lactuca Jan 04 '21 at 11:06

3 Answers3

4

There's a few parts to your question:

  1. to immediately solve your error, you want to add a group = 1 argument to the geom_line() function

    1a. This is because geom line allows you to draw multiple 'groups' of lines, which are disconnected to each other. In this initial case, all the points are in the same group and you want to connect them all by lines.

  2. To plot your multiple groups, you want to first orient your data in a long format, which will let you work with ggplot far easier. To do this you want to include a line like:

     tidyr::pivot_longer(data, -Date, values_to = "value_on_date", names_to = "region")
    

which will generate a long format of your data

you can then alter your code to something like

df %>% 
        tidyr::pivot_longer(-Date, values_to = "value_on_date", names_to = "region") %>% 
ggplot( aes(Date, value_on_date)) +
        geom_line(aes(group = region)) +
        labs(x="Date", y="Value on date")

which will then show 3 lines, one for each region.

2a. Notable things in that code, notice that in geom_line, the group is now inside an aes() call, this is since the group will now change with the data, rather than being a constant, this is a common structure across all of ggplot.

2b. another principle in ggplot is that each row should be 1 observation, with all the associated detail. With the data you showed, each row was actually 3 bits of data, one for each region, which won't work well with ggplot.

2c. you can then extend this, by adding stuff like colour = region, into the aes() argument, to show which region is which more clearly

Hope this helps, and as stefan said, including some minimal reproducible example of how to get to your session state helps anyone looking to answer your question.

J Osborne
  • 41
  • 3
1

The issue is that your x-axis variable is a character or categorical variable. In that case ggplot by default uses this variable for the grouping of the data, i.e. there is only one observation per group. In that case you have to tell ggplot about the desired grouping which could be done by group=1 which means that ggplot2 should treat all observations as belonging to one group which for simplicity we call 1.

To get a line plot for all your regions it's best to reshape your data to long format using e.g. tidy::pivot_longer which gives us two new cols, one with the names of the categories or regions and one with the corresponding values. After reshaping you could map the values on y and group by regions using group=name.

library(dplyr)
library(tidyr)
library(ggplot2)

df <- structure(list(Date = c("01-2019", "02-2019", "03-2019"), `North East` = c(
  5.05625777763551,
  5.58119346747183, 5.41295614949722
), London = c(
  4.2102766429572,
  4.45850956493638, 4.36960549219723
), `West Midlands` = c(
  5.0708122696351,
  5.20425572086481, 5.07463979478007
)), row.names = c(NA, 3L), class = "data.frame")

df_long <- df %>%
  pivot_longer(-Date)

ggplot(df_long, aes(Date, value, color = name, group = name)) +
  geom_line() +
  labs(x = "Date", y = "Region")

stefan
  • 90,330
  • 6
  • 25
  • 51
  • It works perfectly, thanks a lot both for the code and the thorough explanation! Can I just ask you how come you included "-Date" in pivot_longer? df_long <- df %>% pivot_longer(-Date). – Lactuca Jan 04 '21 at 11:41
  • If not using `-Date` the date would be treated as a fourth region category, therefore with `-Date` I tell pivot_longer to reshape all columns except for `Date`. – stefan Jan 04 '21 at 11:46
0

Adding a group aesthetic will fix this error.

Add group = 1 to your aes in the ggplot call.