I need to mark the biggest difference on a plot

Question

the first thing is...sorry for my English.

I have two differents objects: a factor and a numeric. I print a geom_point() and geom_line() with x=Year with trimester (As factor), and y=value (numeric).

I divide y information into two factor levels, through group= a factor variable.

Then, I have two lines. I need to print 3 vertical lines in the position which is the biggest difference between both lines, and the lessest too.

I have seen that I can make a line with geom_segment or geom_line. But I need the start and the end, so I need x reference. But my x is a factor, not a numeric.

So, what can I do?

I have this enter image description here

And I need something like this (page 3, first graphic):

https://riull.ull.es/xmlui/bitstream/handle/915/6574/A_08_%282017%29_07.pdf?sequence=1&isAllowed=y

And this is my data (first 20 lines) before I transform it with meltfunction (by "Ambos.Sexos")

`

structure(list(AMBOS.SEXOS = structure(20:1, .Label = c("2014TII", 
"2014TIII", "2014TIV", "2015TI", "2015TII", "2015TIII", "2015TIV", 
"2016TI", "2016TII", "2016TIII", "2016TIV", "2017TI", "2017TII", 
"2017TIII", "2017TIV", "2018TI", "2018TII", "2018TIII", "2018TIV", 
"2019TI"), class = "factor"), Activos = structure(c(18L, 20L, 
19L, 12L, 11L, 17L, 4L, 5L, 2L, 7L, 10L, 8L, 6L, 13L, 14L, 15L, 
16L, 9L, 1L, 3L), .Label = c("1.086,16", "1.089,43", "1.091,95", 
"1.094,48", "1.094,62", "1.097,06", "1.100,27", "1.100,74", "1.100,83", 
"1.102,15", "1.107,86", "1.108,98", "1.110,40", "1.110,65", "1.110,78", 
"1.114,98", "1.118,25", "1.130,20", "1.131,53", "1.141,58"), class = "factor"), 
    Ocupados = structure(c(18L, 20L, 19L, 17L, 16L, 15L, 14L, 
    13L, 8L, 12L, 11L, 7L, 9L, 10L, 6L, 5L, 4L, 3L, 1L, 2L), .Label = c("723,87", 
    "735,09", "758,67", "771,46", "774,24", "793,48", "799,91", 
    "809,66", "811,85", "813,34", "815,45", "826,28", "828,61", 
    "855,17", "871,81", "879,46", "886,57", "892,47", "909,26", 
    "913,36"), class = "factor")), row.names = c(NA, 20L), class = "data.frame")

`

You mentioned three vertical lines, one must be positioned in the biggest difference, another in the least and... the last one? — Rodrigo Orellana, Jul 31 '19 at 16:27
Please share a small sample data set so we can see the problem and show a solution. — Gregor Thomas, Jul 31 '19 at 16:29
Yes. Line or whatever. I need to mark the biggest and least difference, like a milestone. — Csf, Jul 31 '19 at 16:29
If you need tips on sharing data, [this answer is very good](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Something like `dput(droplevels(your_data[1:20, ]))` is a good way to share the first 20 rows. If you could share a 5-10 rows of each factor level, that would give us something to work with. — Gregor Thomas, Jul 31 '19 at 16:53
You say that y is numeric, but your `dput` nicely shows that you have read all these numbers in as factors instead. Use `read.csv2` instead of `read.csv` etc. — Axeman, Jul 31 '19 at 18:20

Matias Andina · Answer 1 · 2019-08-01T17:41:17.487

This is what you want. Looks like you might need some str_replace, it was a quick and dirty way to transform the numbers written like this 1.000,50 to my local numeric standard of 1000.50.

I named your data.frame qq. First compute differences.

qq <- qq %>% mutate(
  Activos = str_replace(Activos, "\\.", ""),
  Activos = str_replace(Activos, ",", "\\."),
  Ocupados = str_replace(Ocupados, "\\.", ""),
  Ocupados = str_replace(Ocupados, ",", "\\."),
  Activos = as.numeric(Activos),
  Ocupados = as.numeric(Ocupados),
  diferencia = Activos - Ocupados,
  # create false x axis for plotting purposes
  # double check, looks like your data is ordered
  # with the most recent first, we will need to account for that
  falso_x = -desc(as.numeric(AMBOS.SEXOS)))


minimo <- qq %>% arrange(diferencia) %>%
  head(n=1)

maximo <- qq %>% arrange(desc(diferencia)) %>%
  head(n=1)

Now make the plot. I didn't go the reshape::melt way. Although it's possible, it might be more cumbersome in your case. (you can try reshape::melt(qq, id.vars=c("AMBOS.SEXOS", "falso_x") and then filter out the diferencia values before making the plot).

The trick is to use the false x axis and then put the labels manually.

qq %>%
  ggplot(aes(falso_x, Activos))+
  geom_point()+
  geom_line()+
  geom_point(aes(falso_x, Ocupados))+
  geom_line(aes(falso_x, Ocupados))+
  scale_x_continuous(breaks=1:max(qq$falso_x), #addapt here for length,
                     labels=rev(qq$AMBOS.SEXOS))+
  xlab("") + ylab("") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  geom_segment(data=minimo, aes(x=falso_x, xend=falso_x,
                                y=Ocupados, yend=Activos), color="blue")+
  geom_segment(data=maximo, aes(x=falso_x, xend=falso_x,
                                y=Ocupados, yend=Activos), color="red")

Update

To make the legend, ggplot2 really likes to have things inside aes. We can do a workaround and name a false color which will be assigned to a real color with scale_color_manual. Also check this answer.

This is mostly a hack. As I said on above, if you go the reshape2::melt way you can have other options (see below).

qq %>%
  ggplot(aes(falso_x, Activos))+
  geom_point(aes(color="Activo"))+
  geom_line(aes(color="Activo"))+
  geom_point(aes(falso_x, Ocupados, color="Ocupado"))+
  geom_line(aes(falso_x, Ocupados, color="Ocupado"))+
  scale_x_continuous(breaks=1:max(qq$falso_x), #addapt here for length,
                     labels=rev(qq$AMBOS.SEXOS))+
  xlab("") + ylab("") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  geom_segment(data=minimo, aes(x=falso_x, xend=falso_x,
                                y=Ocupados, yend=Activos), color="blue")+
  geom_segment(data=maximo, aes(x=falso_x, xend=falso_x,
                                y=Ocupados, yend=Activos), color="red")+
  scale_colour_manual(name="Grupo",
                      values=c(Ocupado="darkorange",
                               Activo="green"))

`reshape2::melt` way

m <- reshape2::melt(qq, id.vars=c("AMBOS.SEXOS","falso_x"))

m %>% filter(variable!="diferencia") %>%
  ggplot(aes(falso_x, value, color=variable))+
  geom_point()+
  geom_line()+
  scale_x_continuous(breaks=1:max(qq$falso_x), #addapt here for length,
                     labels=rev(qq$AMBOS.SEXOS))+
  xlab("") + ylab("") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  geom_segment(data=minimo, aes(x=falso_x, xend=falso_x,
                                y=Ocupados, yend=Activos), color="blue")+
  geom_segment(data=maximo, aes(x=falso_x, xend=falso_x,
                                y=Ocupados, yend=Activos), color="red")

Note that you could use `top_n` instead of `arrange %>% head`. — Axeman, Jul 31 '19 at 18:16
Thank you a lot. It helps me a lot. One more thing...I change plot characteristic, like line colours. I need to print a legend...it is possible? Because I can't draw it with your code. Thank you. — Csf, Aug 01 '19 at 17:06
@Csf There are 2 ways there that will get you to where you want to go, please accept the answer if it helped you. — Matias Andina, Aug 01 '19 at 17:41
@Csf Also, I would strongly recommend checking the structure of your decimal points, it is likely that the Spanish set up is messing with what R is expecting. That being said, you can also try es.stackoverflow.com for future questions :) !! — Matias Andina, Aug 01 '19 at 17:43
Thank you a lot again. I didn't know I have to accept the answer. I think I did it. And yes, the next one, maybe I will try in this page ;). — Csf, Aug 05 '19 at 15:59

score 0 · Accepted Answer · answered Jul 31 '19 at 17:19

I really recommend you to improve this example, but this will do the work if you keep the structure. Translate: Te recomiendo que mejores el código, si sigues la estructura tal y como la definí va a funcionar, pero se puede mejorar.

a<-c("day1","day2","day3","day1","day2","day3")
b<-c(1.5,3,5,2,5,8)
d<-c("g1","g1","g1","g2","g2","g2")
df<-data.frame(a,b,d)

subdf1<-df[which(d=="g1"),]
subdf2<-df[which(d=="g2"),]

mdiff<-which.max(subdf2$b-subdf1$b)
ldiff<-which.min(subdf2$b-subdf1$b)

lbound<-subdf1$a[ldiff]
mbound<-subdf2$a[mdiff]

require(ggplot2)

base.plot<-ggplot(df)+geom_line(aes(x=a,y=b,
group=d))+labs(x="Days",y="Values",group=c("G1","G2"))+
geom_point(aes(x=a,y=b),col="green")

base.plot+geom_line(data = df[which(df$a==lbound | df$a==mbound),],
aes(x=a,y=b, group=a),col="red")

I need to mark the biggest difference on a plot

2 Answers2

Update

reshape2::melt way

`reshape2::melt` way