1

I just wondering is it possible to add legend, when we got two diffrent sets of points.

samochodziki %>% 
  mutate(dopasowane = model_classic$fitted.values) %>% 
  arrange(lp100k) %>%
  mutate(index = 1:nrow(samochodziki)) %>%
  ggplot(aes(x = index)) +
  geom_point(aes(y = lp100k), color = "red") +
  geom_point(aes(y = dopasowane), color = "blue")

Data on which I working:

structure(list(lp100k = c(9.8006076285327, 11.2006944326088, 
7.4671296217392, 12.1244630456075, 13.0674768380436, 8.11084769257879
), cylinders = c(4L, 6L, 4L, 6L, 6L, 4L), displacement = c(1474.8354, 
3277.412, 1458.44834, 3801.79792, 3687.0885, 1114.32008), horsepower = c(75L, 
85L, 71L, 90L, 105L, 49L), weight = c(956.17271596, 1173.44346119, 
902.6488163, 1456.0315077, 1415.66178677, 846.85695479), acceleration = c(15.5, 
16, 14.9, 17.2, 16.5, 19.5), year = c(74L, 70L, 78L, 78L, 73L, 
73L), origin = c(2L, 1L, 2L, 1L, 1L, 2L), name = c("fiat 128", 
"ford maverick", "volkswagen scirocco", "amc concord", "plymouth valiant", 
"fiat 128")), .Names = c("lp100k", "cylinders", "displacement", 
"horsepower", "weight", "acceleration", "year", "origin", "name"
), row.names = c(NA, 6L), class = c("tbl_df", "tbl", "data.frame"
))

Model_classic shape

Call:
lm(formula = lp100k ~ horsepower + weight + year, data = samochodziki)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3229 -0.7818 -0.0626  0.6344  6.3843 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.8668314  1.8503686  12.358  < 2e-16 ***
horsepower   0.0151390  0.0042094   3.596 0.000375 ***
weight       0.0068079  0.0003937  17.294  < 2e-16 ***
year        -0.2950063  0.0230468 -12.800  < 2e-16 ***

So I want to add legend that for red points got "real", and for blue points "fitted".

  • 1
    This generally involves mapping aesthetics to contants inside `aes()`. See [this answer](https://stackoverflow.com/a/10355844/2461552) for an example. – aosmith May 29 '18 at 17:40
  • We don't have a copy of your data, but in any `ggplot` tutorial and many questions on SO, you'll see that `ggplot` generally expects long-shaped data. If you reshape the data appropriately, the common setup is to then map a variable to some aesthetic, in this case color, which will then eliminate the need for more than one `geom_point` and will place that aesthetic in a legend. – camille May 29 '18 at 17:46
  • Ok, now I add some important data. I know that `ggplot` like long-shaped data, but in that case I don't know how to melt this data, so I think to add legend manualy. – Bartłomiej Fatyga May 29 '18 at 17:58
  • @BartłomiejFatyga: please post the output of `dput(samochodziki)` not `class` – Tung May 29 '18 at 18:00
  • Please use `dput` to post your data here. Otherwise we don't have a working copy of it. Someone can almost certainly help you reshape it once you post it – camille May 29 '18 at 18:01
  • Data are too long, to paste, so here is the link: https://jpst.it/1fQhm – Bartłomiej Fatyga May 29 '18 at 18:11
  • Just share an illustrative sample. We don't need your whole data, 4-6 rows is plenty. Use `dput()` to post your data here, e.g., `dput(head(your_data))`. – Gregor Thomas May 29 '18 at 18:22
  • Edited. I've got one more question just out of curiosity. What is advantage of using `dput`? – Bartłomiej Fatyga May 29 '18 at 19:22
  • 1
    `dput` is copy/pasteable and preserves structure and class information. If I share the `dput` of a data frame `dd` with you, you can copy/paste it into R and get the exact same data frame: here's my dput `dd = structure(list(date = structure(17532, class = "Date"), f = structure(1L, .Label = c("a", "b"), class = "factor"), char = "hi", x = 3), .Names = c("date", "f", "char", "x"), row.names = c(NA, -1L), class = "data.frame")`. It's a data frame, it has a Date column, a character column, a factor column... but I don't need to tell you any of that because it's all there effortlessly. – Gregor Thomas May 30 '18 at 03:39

1 Answers1

3

I simplified the names a little, but I recreated the model from your data. I gave the data frame a column of the fitted values, renamed the measured values just to get it a little neater after the gather, and then gathered the two lp columns.

library(tidyverse)

model <- lm(lp100k ~ horsepower + weight + year, df)

df_long <- df %>% 
  mutate(lp_fitted = model$fitted.values) %>% 
  arrange(lp100k) %>%
  rename(lp_measured = lp100k) %>%
  mutate(index = 1:nrow(df)) %>%
  gather(key = type, value = lp100k, lp_measured, lp_fitted)

df_long
#> # A tibble: 12 x 11
#>    cylinders displacement horsepower weight acceleration  year origin
#>        <int>        <dbl>      <int>  <dbl>        <dbl> <int>  <int>
#>  1         4        1458.         71   903.         14.9    78      2
#>  2         4        1114.         49   847.         19.5    73      2
#>  3         4        1475.         75   956.         15.5    74      2
#>  4         6        3277.         85  1173.         16      70      1
#>  5         6        3802.         90  1456.         17.2    78      1
#>  6         6        3687.        105  1416.         16.5    73      1
#>  7         4        1458.         71   903.         14.9    78      2
#>  8         4        1114.         49   847.         19.5    73      2
#>  9         4        1475.         75   956.         15.5    74      2
#> 10         6        3277.         85  1173.         16      70      1
#> 11         6        3802.         90  1456.         17.2    78      1
#> 12         6        3687.        105  1416.         16.5    73      1
#> # ... with 4 more variables: name <chr>, index <int>, type <chr>,
#> #   lp100k <dbl>

Now that the data is in this format, plotting is easy—you can just assign type to the color, so the lp_measured values get one color and the lp_fitted values get another.

ggplot(df_long, aes(x = index, y = lp100k, color = type)) +
  geom_point() +
  scale_color_manual(values = c(lp_measured = "red", lp_fitted = "blue"))

camille
  • 16,432
  • 18
  • 38
  • 60