-1

I am trying to make a plot on my R look like the one I have on my Python:

The one on the left is Python (The correct one), while the one on the right is R (I'm trying to change this)

This is the data frame for both Python and R.

All_Flights_Combined_Month

Year Month Delay_count Total_count
2003 Jan 151238 552109
2003 Feb 158369 500206
2003 Mar 152156 559342
2003 Apr 125699 527303
2003 May 136551 533782
2003 Jun 163497 536496
2003 Jul 183491 558568
2003 Aug 178979 556984
2003 Sep 113916 527714
2003 Oct 131409 552370
2003 Nov 157157 528171
2003 Dec 206743 555495
2004 Jan 198818 583987
2004 Feb 183658 553876
2004 Mar 183273 601412
2004 Apr 170114 582970
2004 May 191604 594457
2004 Jun 238074 588792
2004 Jul 237670 614166
2004 Aug 215667 623107
2004 Sep 147508 585125
2004 Oct 193951 610037
2004 Nov 197560 584610
2004 Dec 254786 606731
2005 Jan 229809 594924
2005 Feb 184920 545332
2005 Mar 226883 617540
2005 Apr 169221 594492
2005 May 178327 614802
2005 Jun 236724 609195
2005 Jul 268988 627961
2005 Aug 240410 630904
2005 Sep 165541 574253
2005 Oct 186778 592712
2005 Nov 193399 566138
2005 Dec 256861 572343

And these are the codes for Python:

# To plot the line graph
# Create separate data frames for each year
years = All_Flights_Combined_Month['Year'].unique()
data_frames_month = [All_Flights_Combined_Month[All_Flights_Combined_Month['Year'] == year] for year in years]

# Create subplots
fig, ax = plt.subplots(figsize=(10, 8))

# Plot Delay_count for each year
for i, year in enumerate(years):
    color = 'red' if str(year) == '2003' else 'green' if str(year) == '2004' else 'blue'
    ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Delay_count'], label=f"{year} Delay Count", color=color)

# Plot Total_Count for each year
for i, year in enumerate(years):
    color = 'orange' if str(year) == '2003' else 'yellow' if str(year) == '2004' else 'purple'
    ax.plot(data_frames_month[i]['Month'], data_frames_month[i]['Total_Count'], label=f"{year} Total Count", color=color)

# Set title and labels
ax.set_title('Flight Count by Month')
ax.set_xlabel('Month')
ax.set_ylabel('Number of Flights')

# Add legend
ax.legend(title='Year')

# Save the plot as a pdf file
plt.savefig('Monthly Flight Comparison Python.pdf', format='pdf')

# Show the plot
plt.show()

While this is for R:

{r}
# To plot the line graph
month_plot <- ggplot() + geom_line(data= All_Flights_Combined_Month, aes(x =Month, y=Delay_count, group=Year, color=Year)) + 
  geom_line(data=All_Flights_Combined_Month, aes(x =Month, y=Total_count, group=Year, color=Year))+ scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
  xlab("Months")+
  ylab("Number of Flights")+
  ggtitle("Flight Count by Month")

# To save the plot  as .pdf
ggplot2::ggsave("Monthly Flight Comparison R.pdf", plot = last_plot(), width = 8, height = 6)

I need the legend and the line colors to match the ones on Python. I hope I have provide sufficient information. Please kindly advice thank you.

I tried adding scale_color_manual to each geom_line but it churned out an error stating that scale_color_manual values has already been used and it will overwrite the previous ones.

Quinten
  • 35,235
  • 5
  • 20
  • 53
Joseph Ng
  • 13
  • 3

4 Answers4

2

You could transform your data to a longer format and combine the Year and longer format of Delay count and Total count to one string using paste0 and gsub. To get the right colors you could use scale_color_manual, with right order using breaks like this:

library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
  pivot_longer(cols = Delay_count:Total_count) %>%
  mutate(Year2 = paste0(Year, " ", gsub("_", " ", name)),
         Month = factor(Month, levels = month.abb)) %>%
  ggplot(aes(x = Month, y = value, color = Year2, group = Year2)) +
  geom_line() +
  labs(color = "Year", x = "Month", y = "Number of Flights") +
  scale_color_manual(values = c("2003 Delay count" = "red", 
                                "2004 Delay count" = "green", 
                                "2005 Delay count" = "blue", 
                                "2003 Total count" = "orange", 
                                "2004 Total count" = "yellow", 
                                "2005 Total count" = "purple"),
                     breaks = c("2003 Delay count",
                                "2004 Delay count",
                                "2005 Delay count",
                                "2003 Total count", 
                                "2004 Total count", 
                                "2005 Total count"))

Created on 2023-02-19 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
2

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

Then change variable Year or name to the interaction between these two. That's the color and grouping variable.

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
  library(ggplot2)
})

clrs <- c("2003 Delay Count" = "#e44b3b", "2003 Total Count" = "#edbe70",
          "2004 Delay Count" = "#0d720d", "2004 Total Count" = "#f8f867", 
          "2005 Delay Count" = "#0000cb", "2005 Total Count" = "#6d0469")

All_Flights_Combined_Month %>%
  pivot_longer(ends_with("count")) %>%
  mutate(Month = factor(Month, levels = month.abb),
         Year = interaction(Year, name, sep = " "),
         Year = sub("_c", " C", Year)) %>%
  select(-name) %>% 
  ggplot(aes(Month, value, colour = Year, group = Year)) +
  geom_line(linewidth = 1.25) +
  scale_color_manual(values = clrs) +
  theme_minimal()

Created on 2023-02-19 with reprex v2.0.2


Data

x <- "Year  Month   Delay_count     Total_count
2003    Jan     151238  552109
2003    Feb     158369  500206
2003    Mar     152156  559342
2003    Apr     125699  527303
2003    May     136551  533782
2003    Jun     163497  536496
2003    Jul     183491  558568
2003    Aug     178979  556984
2003    Sep     113916  527714
2003    Oct     131409  552370
2003    Nov     157157  528171
2003    Dec     206743  555495
2004    Jan     198818  583987
2004    Feb     183658  553876
2004    Mar     183273  601412
2004    Apr     170114  582970
2004    May     191604  594457
2004    Jun     238074  588792
2004    Jul     237670  614166
2004    Aug     215667  623107
2004    Sep     147508  585125
2004    Oct     193951  610037
2004    Nov     197560  584610
2004    Dec     254786  606731
2005    Jan     229809  594924
2005    Feb     184920  545332
2005    Mar     226883  617540
2005    Apr     169221  594492
2005    May     178327  614802
2005    Jun     236724  609195
2005    Jul     268988  627961
2005    Aug     240410  630904
2005    Sep     165541  574253
2005    Oct     186778  592712
2005    Nov     193399  566138
2005    Dec     256861  572343"
All_Flights_Combined_Month <- read.table(text = x, header = TRUE)

Created on 2023-02-19 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
2

Something like this:

library(tidyverse)

df %>% 
  pivot_longer(-c(Year, Month)) %>%  
  mutate(Year = paste(Year, name)) %>% 
  ggplot(aes(x =Month, y=value, color=factor(Year)))+
  geom_line(aes(group = Year))+
    scale_x_discrete(limits = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))+
    scale_color_manual(values = c("purple", "yellow", "gold", "blue", "green", "red"))+
  xlab("Months")+
  ylab("Number of Flights")+
  ggtitle("Flight Count by Month")+
  theme_classic()

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66
1

Using just base R. First, reshape into wide format, then use matplot and customize axis and mtext a little.

dat_w <- reshape(dat, idvar='Month', timevar='Year', direction='w')

par(mar=c(5, 6, 4, 2))
matplot(dat_w[, -1], type='l', lty=1, col=2:8, axes=FALSE, ylab='', main='Flight Count By Month')
axis(side=1, at=1:12, labels=dat_w$Month, cex.axis=.8)
axis(2, axTicks(2), formatC(axTicks(2), format='f', digits=0), las=2, cex.axis=.8)
mtext('Month', side=1, line=2.5, cex=.8); mtext('Number of Flights', 2, 4, cex=.8)
legend('right', c(paste(unique(dat$Year), rep(gsub('_', ' ', names(dat)[3:4]), each=3))),
       col=2:8, lty=1, title='Year', cex=.7)
box()

enter image description here

jay.sf
  • 60,139
  • 8
  • 53
  • 110