1

I have an excel with data of 3 countries, and their values of Marriages and Divorces from 1960-2019 and i need to create a graph with y=value of each variable for each country through the years, I have tried doing this but I can't seem to make it work and I'm not sure what I'm doing wrong. (I have to use ggplot, it's a class requirement)

library(ggplot2)
Anos <- factor(CASDIV$Ano)
names(CASDIV)



colors <- c("Casamentos: Croácia" = "Blue", "Casamentos: Irlanda" = "Orange",
            "Casamentos: Malta" = "Yellow", "Divórcios: Croácia" = "red",
            "Divórcios: Irlanda" = "Green", "Divórcios: Malta" = "Brown")

ggplot(CASDIV, aes(x= Ano))+
  geom_line(data=subset(CASDIV, País== "HR - Croácia"), aes(y=CASDIV$Casamentos, color = "Casamentos : Croácia"), size = 0,01)+
  geom_line(data=subset(CASDIV, País== "IE - Irlanda"), aes(y=CASDIV$Casamentos, color = "Casamentos : Irlanda"), size=0,01)+
  geom_line(data = subset(CASDIV, País=="MT - Malta"), aes(y=CASDIV$Casamentos, color = "casamentos: Malta"), size=0,01)+
  geom_line(data=subset(CASDIV, País== "HR - Croácia"), aes(y=CASDIV$Divórcios, color = "Divórcios : Croácia"), size=0,01)+
  geom_line(data=subset(CASDIV, País == "IE - Irlanda"), aes(y=CASDIV$Divórcios, color = "Divórcios : Irlanda"), size=0,01)+
  geom_line(data=subset(CASDIV, País=="MT - Malta"), aes(y=CASDIV$Divórcios, color = "Divórcios : Malta"), size=0,01)+
  labs(x="Anos", y= "Valor", Colour = "Legenda") +
  scale_color_manual(values= colors)

MRE:

2017,HR - Croácia,20310,6265
2018,HR - Croácia,19921,6125
2019,HR - Croácia,19761,5936
2017,IE - Irlanda,22021,0
2018,IE - Irlanda,21053,0
2019,IE - Irlanda,20313,0
2016,MT - Malta,3034,371
2017,MT - Malta,2934,312
2018,MT - Malta,2831,349
2019,MT - Malta,2674,354
Phil
  • 7,287
  • 3
  • 36
  • 66
  • Welcome to SO! Would you mind providing [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. To post your data type `dput(NAME_OF_DATASET)` into the console and copy the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 10))` for the first 10 rows of data. – stefan Dec 23 '21 at 16:06
  • ... this said: As a general rule get rid of `CASDIV$...`. Simply use e.g. `aes(y=Casamentos, ..)`. As far as I can tell using `CASDIV$Casamentos` will probably result in an error. – stefan Dec 23 '21 at 16:08
  • So, this is an excerpt of the data im using dput(head(CASDIV$Casamentos,5)) c(36761, 36634, 36149, 33976, 35965) > dput(head(CASDIV$Divórcios,5)) c(4811, 5057, 4883, 5114, 5217) > dput(head(Anos,5)) structure(1:5, .Label = c("1960", "1961", "1962", "1963", "1964", "1965", "1966", "1967", "1968", "1969", "1970", "1971", "1972", "1973", "1974", "1975", "1976", "1977", "1978", "1979", "1980", ), class = "factor"), Sorry for the bad formatting, i cut some of the years in the Anos class to fit, thanks – Joel Alexandre Dec 23 '21 at 16:27
  • when i run this i receive a "Error: `stat` must be either a string or a Stat object, not a numeric vector" notification, but I'm not using stat – Joel Alexandre Dec 23 '21 at 16:31

1 Answers1

0

There are several issues with your code. First, as I already mentioned in my comment you use aes(y = CASDIV$.. in your code which is not recommended and which in your case will result in an error. Second: You use a comma as decimal separator in size=0,01 which is the reason for the mysterious Error: stat must be either a string or a Stat object, not a numeric vector" notification. Always use . as the decimal mark. Finally, while you took the right approach for the colors you have to make sure that the labels you use inside aes() are the same as in your color vector.

Note: A size of 0.01 makes the lines nearly invisible so I switched to 0.1.

Using some fake random data to mimic your real data:

library(ggplot2)

set.seed(123)

CASDIV <- data.frame(
  Ano = seq(1960, 2020, 10),
  País = rep(c("HR - Croácia", "IE - Irlanda", "MT - Malta"), each = 7),
  Casamentos = runif(21),
  Divórcios = runif(21)
)
Anos <- factor(CASDIV$Ano)

colors <- c("Casamentos: Croácia" = "Blue", "Casamentos: Irlanda" = "Orange",
            "Casamentos: Malta" = "Yellow", "Divórcios: Croácia" = "red",
            "Divórcios: Irlanda" = "Green", "Divórcios: Malta" = "Brown")

ggplot(CASDIV, aes(x= Ano))+
  geom_line(data=subset(CASDIV, País== "HR - Croácia"), aes(y=Casamentos, color = "Casamentos: Croácia"), size = 0.1)+
  geom_line(data=subset(CASDIV, País== "IE - Irlanda"), aes(y=Casamentos, color = "Casamentos: Irlanda"), size=0.1)+
  geom_line(data = subset(CASDIV, País=="MT - Malta"), aes(y=Casamentos, color = "Casamentos: Malta"), size=0.1)+
  geom_line(data=subset(CASDIV, País== "HR - Croácia"), aes(y=Divórcios, color = "Divórcios: Croácia"), size=0.1)+
  geom_line(data=subset(CASDIV, País == "IE - Irlanda"), aes(y=Divórcios, color = "Divórcios: Irlanda"), size=0.1)+
  geom_line(data=subset(CASDIV, País=="MT - Malta"), aes(y=Divórcios, color = "Divórcios: Malta"), size=0.1)+
  labs(x="Anos", y= "Valor", Colour = "Legenda") +
  scale_color_manual(values= colors)

While your code works, it is probably not the most efficient. Using some data wrangling you could simplify the plotting code considerably:

library(tidyr)
library(dplyr)

CASDIV_long <- CASDIV %>% 
  pivot_longer(-c(Ano, País)) %>% 
  mutate(color = paste(name, substring(País, 5), sep = ":"))

ggplot(CASDIV_long, aes(x= Ano)) + 
  geom_line(aes(y = value, color = color), size = .1) +
  labs(x="Anos", y= "Valor", Colour = "Legenda") +
  scale_color_manual(values= colors)

Created on 2021-12-23 by the reprex package (v2.0.1)

stefan
  • 90,330
  • 6
  • 25
  • 51
  • 1
    Thanks!! I can't upvote your answer, not enough reputation but thanks for the help, I understand now what I did wrong. And with all honestly I don't fully comprehend the more effective way you suggested yet, but hopefully with more experience I'll be able to be more efficient. – Joel Alexandre Dec 23 '21 at 17:17