Correlating data points with ggplot2

Question

I've got a Statistics class that I need to show a set of statistics for (doesn't necessarily have to be related or correlated). So I wanted to show a plot describing the comparison of wins to number of laps lead by F1 drivers in the last 20 years. What I've got using ggplot2 is attached and my code is as follows:

graphics.off()
rm(list=ls())
setwd("mypwd")
library(ggplot2)

data <- read.table("F1 Drivers Lead by win.csv", sep = ",", col.names = c("Driver", "Laps lead", "Wins"))

p <- ggplot(data, aes(x = Wins, y = Laps.lead)) +
  geom_point() +
  geom_text(aes(label = Laps.lead), vjust = -1) +
  facet_wrap(~Driver) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p

I'd like to make the comparison a bit more obvious instead of burying the data in a million facets. I was thinking something along the lines of a double-y-axis graph with two lines connecting the relevant data points, one for fastest laps and one for wins. I know it's not generally good practice, but the people that are going to be viewing this aren't data scientists insisting on good practice. I just need something that is easy to see at a glance. Anyone care to help me out? It's been a long time since I've done anything like this and I'm struggling.

Can you share some of your data with us using dput(head(data))? Without knowing what the data looks like, the best I can recommend is using different colors for each driver with aes(x = Wins, y = Laps.lead, color=Driver) rather than facetting. — Dubukay, Jan 25 '21 at 21:51
one for your statistics course: https://www.tylervigen.com/spurious-correlations — tjebo, Jan 25 '21 at 22:52
double y axis https://stackoverflow.com/questions/3099219/ggplot-with-2-y-axes-on-each-side-and-different-scales — tjebo, Jan 25 '21 at 22:53
combine points with lines https://stackoverflow.com/questions/8592585/combine-points-with-lines-with-ggplot2 — tjebo, Jan 25 '21 at 22:53
And as often, I guess, I also recommend to learn google (This is only half in jest) — tjebo, Jan 25 '21 at 22:53
@Dubukay hey, thanks for the idea on coloring by driver. I didn't think of that. I gave it a try, but the problem is that while some of the data points are very small, others are very large, and furthermore there are 45 drivers, which makes differentiation difficult. The output of dput(head(data)) is ```structure(list(Driver = c("Lewis Hamilton", "Sebastian Vettel", "Michael Schumacher", "Fernando Alonso", "Nico Rosberg", "Kimi Raikkonen" ), Laps.lead = c(5099L, 3495L, 3079L, 1767L, 1533L, 1305L), Wins = c(95L, 53L, 56L, 32L, 23L, 21L)), row.names = c(NA, 6L), class = "data.frame")``` — Andrew Yorkovich, Jan 26 '21 at 18:16
Thanks, @tjebo. I think I actually may resort to using two separate plots. I was going to do it with photo editing, but I didn't know that there was a way to do that within R, so that's cool. As for Google, I already did spend a fair bit of time before posting here, which is what brought me originally to that second link you posted. But I think now the right way is definitely to go with two combined plots. — Andrew Yorkovich, Jan 26 '21 at 18:25

Correlating data points with ggplot2

0 Answers0