253

With this data frame ("df"):

year pollution
1 1999 346.82000
2 2002 134.30882
3 2005 130.43038
4 2008  88.27546

I try to create a line chart like this:

  plot5 <- ggplot(df, aes(year, pollution)) +
           geom_point() +
           geom_line() +
           labs(x = "Year", y = "Particulate matter emissions (tons)", title = "Motor vehicle emissions in Baltimore")

The error I get is:

geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?

The chart appears as a scatter plot even though I want a line chart. I tried to replace geom_line() with geom_line(aes(group = year)) but that didn't work.

In an answer I was told to convert year to a factor variable. I did and the problem persists. This is the output of str(df) and dput(df):

'data.frame':   4 obs. of  2 variables:
 $ year     : num  1 2 3 4
 $ pollution: num [1:4(1d)] 346.8 134.3 130.4 88.3
  ..- attr(*, "dimnames")=List of 1
  .. ..$ : chr  "1999" "2002" "2005" "2008"

structure(list(year = c(1, 2, 3, 4), pollution = structure(c(346.82, 
134.308821199349, 130.430379885892, 88.275457392443), .Dim = 4L, .Dimnames = list(
    c("1999", "2002", "2005", "2008")))), .Names = c("year", 
"pollution"), row.names = c(NA, -4L), class = "data.frame")
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
megashigger
  • 8,695
  • 17
  • 47
  • 79
  • 1
    It gives no error when I run it. Its likely that `df` is not what you think it is. Please state your question in reproducible form, i.e. show the output of `dput(df)`. – G. Grothendieck Nov 22 '14 at 21:27
  • could be that your variables are factors, then you'd need to convert them to numeric – erc Nov 22 '14 at 21:36
  • @G.Grothendieck I posted what you said. I also converted to numeric and still have the problem. – megashigger Nov 22 '14 at 21:44
  • 1
    You really should state questions in reproducible form. It's hard to help you if we can't recreate the error. – Mario Becerra Apr 24 '18 at 20:37
  • is it possible to rank the line point in descending order of "pollution"? – AdIan Mar 29 '21 at 07:12

6 Answers6

516

You only have to add group = 1 into the ggplot or geom_line aes().

For line graphs, the data points must be grouped so that it knows which points to connect. In this case, it is simple -- all points should be connected, so group=1. When more variables are used and multiple lines are drawn, the grouping for lines is usually done by variable.

Reference: Cookbook for R, Chapter: Graphs Bar_and_line_graphs_(ggplot2), Line graphs.

Try this:

plot5 <- ggplot(df, aes(year, pollution, group = 1)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = "Particulate matter emissions (tons)", 
              title = "Motor vehicle emissions in Baltimore")
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
Mario Barbé
  • 5,194
  • 1
  • 13
  • 7
  • 8
    Of note , grouping has to be done with the `group` argument. Grouping only e.g. by `color` would not be sufficient.I just had this trouble and hope this helps someone running into the same – tjebo Aug 09 '19 at 13:39
  • 7
    is this answer still valid? Adding group=1 in the aesthetics doesn't seem to be working anymore. – Giacomo Apr 22 '20 at 10:09
  • 2
    @Giacomo -- works for me, on 3.6.2 on a Mac. Was getting the dreaded warning, but adding group=1 fixed the problem. ggplot(lakemeta, mapping=aes(x=Lake, y=Area, group=1)) + geom_line(size=2, color="blue") – Jenn D. May 24 '20 at 21:49
  • is it possible to rank the point in descending order of "pollution"? – AdIan Mar 29 '21 at 07:13
  • @AndyIan Yes. A simply dplyr solution would be: `df %>% arrange(pollution) %>% ggplot()` – Pss Jan 29 '22 at 18:57
  • 4
    Is there any reason `geom_line()` couldn't *assume* `group = 1` if it's omitted? – stevec Jul 17 '22 at 11:38
49

You get this error because one of your variables is actually a factor variable . Execute

str(df) 

to check this. Then do this double variable change to keep the year numbers instead of transforming into "1,2,3,4" level numbers:

df$year <- as.numeric(as.character(df$year))

EDIT: it appears that your data.frame has a variable of class "array" which might cause the pb. Try then:

df <- data.frame(apply(df, 2, unclass))

and plot again?

agenis
  • 8,069
  • 5
  • 53
  • 102
9

I had similar problem with the data frame:

group time weight.loss
1 Control  wl1    4.500000
2    Diet  wl1    5.333333
3  DietEx  wl1    6.200000
4 Control  wl2    3.333333
5    Diet  wl2    3.916667
6  DietEx  wl2    6.100000
7 Control  wl3    2.083333
8    Diet  wl3    2.250000
9  DietEx  wl3    2.200000

I think the variable for x axis should be numeric, so that geom_line knows how to connect the points to draw the line.

after I change the 2nd column to numeric:

 group time weight.loss
1 Control    1    4.500000
2    Diet    1    5.333333
3  DietEx    1    6.200000
4 Control    2    3.333333
5    Diet    2    3.916667
6  DietEx    2    6.100000
7 Control    3    2.083333
8    Diet    3    2.250000
9  DietEx    3    2.200000

then it works.

Xin Niu
  • 533
  • 1
  • 5
  • 15
1

Start up R in a fresh session and paste this in:

library(ggplot2)

df <- structure(list(year = c(1, 2, 3, 4), pollution = structure(c(346.82, 
134.308821199349, 130.430379885892, 88.275457392443), .Dim = 4L, .Dimnames = list(
    c("1999", "2002", "2005", "2008")))), .Names = c("year", 
"pollution"), row.names = c(NA, -4L), class = "data.frame")

df[] <- lapply(df, as.numeric) # make all columns numeric

ggplot(df, aes(year, pollution)) +
           geom_point() +
           geom_line() +
           labs(x = "Year", 
                y = "Particulate matter emissions (tons)", 
                title = "Motor vehicle emissions in Baltimore")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Start up R in a fresh session and paste the code in my post into it. – G. Grothendieck Nov 22 '14 at 22:15
  • Have you figured out this problem. I have same problem to yours which I have only one value for each x value. Waiting for your response. Thanks. – Hoang Le Jul 27 '17 at 11:06
  • 1
    Can you explain why converting everything to numeric fixes the issue? My ordered factor variable is a character one, so I can't use numerics in its stead. – Rafs Nov 19 '20 at 16:35
  • `pollution` is a 1d array rather than a plain vector. Look at `str(df)` – G. Grothendieck Nov 19 '20 at 16:43
1

I got a similar prompt. It was because I had specified the x-axis in terms of some percentage (for example: 10%A, 20%B,....). So an alternate approach could be that you multiply these values and write them in the simplest form.

Areeha
  • 823
  • 7
  • 11
1

I found this can also occur if the most of the data plotted is outside of the axis limits. In that case, adjust the axis scales accordingly.

qwr
  • 9,525
  • 5
  • 58
  • 102