6

Please help me, regarding the issue that I have when I try to plot grouped multiple lines with geom_line in ggplot2. The problem arises when I try to group lines based on one variable/column, namely: Region.

GDP_time_series_analysis %>% 
  group_by(Region) %>% 
  ggplot()+geom_line(aes(Year, Total_GDP, group=Region, color=Region))

The code that I provide produces the following graph: enter image description here it is correct graph only for one region (purple line) that consists of only one county (another variable), but not for the other 3 regions that have more counties. I suppose there is a problem with grouping, I am not able to group other 3 regions as a group for the graph (although, as you see, I did use group_by (Region) in the code).

Sorry if this question is not completely not up to the standards (it is my first here) and thank you. The subset of the data is following:

 structure(list(County = c("City of Zagreb", "City of Zagreb", 
"City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
"City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
 "City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
 "City of Zagreb", "City of Zagreb", "City of Zagreb", "City of Zagreb", 
 "Zagreb County", "Zagreb County", "Zagreb County", "Zagreb County", 
 "Zagreb County", "Zagreb County", "Zagreb County", "Zagreb County", 
 "Zagreb County", "Zagreb County", "Zagreb County", "Zagreb County"
  ), Region = c("Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", 
 "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", 
  "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "Zagreb", "North        Croatia", 
 "North Croatia", "North Croatia", "North Croatia", "North Croatia", 
 "North Croatia", "North Croatia", "North Croatia", "North Croatia", 
  "North Croatia", "North Croatia", "North Croatia"), Year = c(2000, 
  2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
  2012, 2013, 2014, 2015, 2016, 2017, 2000, 2001, 2002, 2003, 2004, 
  2005, 2006, 2007, 2008, 2009, 2010, 2011), Population = c(771000, 
  771000, 772000, 772000, 775000, 776000, 778000, 780000, 783000, 
   785000, 788000, 790000, 792000, 795000, 798000, 8e+05, 802000, 
  803000, 296000, 296000, 299000, 302000, 305000, 307000, 310000, 
  312000, 314000, 315000, 317000, 317000), GDP_percap_EUR =        c(8975.53835599625, 
  10168.0040269207, 11091.6676199461, 12240.0345558531, 13421.0447587177, 
   15085.3049042075, 16647.4994908354, 18025.966664434, 19706.5391945802, 
   18534.1115208295, 19739.3466772558, 19408.6216726494,          18961.2735614516, 
   18546.0140474649, 18477.4378485715, 18994.6373722612, 19710.3754557913, 
  20849.7073006642, 4335.38213876616, 4307.23697694032, 5278.97949713334, 
  5459.93196849043, 5967.08989896781, 6687.19494658443, 6861.43232701965, 
  7759.05700432905, 8446.22608743048, 8086.60105100451, 7541.08792074132, 
  7667.23597749996), GDP_percap_PPP_EU_100 = c(80.0982702062271, 
  82.6988344044675, 85.4138484640405, 91.204873884138, 93.9216165828703, 
  99.0724656137407, 104.305150969215, 107.963791825045, 111.305636873515, 
  109.91689646398, 111.438020798517, 110.735014385039, 110.140140004045, 
  107.718076160351, 105.910224718338, 106.327225119802, 107.021331220602, 
  108.151130040081, 38.6892235568413, 35.0317994125204, 40.6519533638096, 
  40.6839052888146, 41.7582043486098, 43.9180311969089, 42.9904043624586, 
  46.4716944599064, 47.7056151035234, 47.9577394076775, 42.5730357896448, 
  43.7450685876577), Total_GDP = c(6920140072.47311, 7839531104.75587, 
  8562767402.59836, 9449306677.11856, 10401309688.0062, 11706196605.665, 
  12951754603.8699, 14060253998.2585, 15430220189.3563, 14549277543.8512, 
  15554605181.6776, 15332811121.393, 15017328660.6697, 14744081167.7346, 
   14744995403.16, 15195709897.809, 15807721115.5446, 16742314962.4333, 
  1283273113.07478, 1274942145.17433, 1578414869.64287, 1648899454.48411, 
  1819962419.18518, 2052968848.60142, 2127044021.37609, 2420825785.35066, 
  2652114991.45317, 2547279331.06642, 2390524870.875, 2430513804.86749
   )), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"
    ))
Waldi
  • 39,242
  • 6
  • 30
  • 78
kosk
  • 263
  • 1
  • 3
  • 8
  • Your data is incomplete. It misses closing `))`; and it does not contain enough data to reproduce your plot. – ziggystar Feb 10 '21 at 11:10
  • Sorry, not sure how to do this because to reproduce the plot I would need to put the whole data-set. I will try to correct it. – kosk Feb 10 '21 at 11:19
  • `ggplot(GDP_time_series_analysis, aes(Year, Total_GDP, colour = Region)) + geom_line()`. No need to `group_by` nor `group = Region`. – Rui Barradas Feb 10 '21 at 11:21
  • Can you post sample data more representative of the problem? Please edit **the question** with the output of `dput(GDP_time_series_analysis)`. Or, if it is too big with the output of `dput(head(GDP_time_series_analysis, 30))`. – Rui Barradas Feb 10 '21 at 11:22
  • Tnx for suggestion, I did this, as Rui suggested: dput(head(GDP_time_series_analysis, 30)) – kosk Feb 10 '21 at 11:32

2 Answers2

2

The issue is, that your data is on County level but you're plotting it on Region (less granular). If you try to directly plot the data the way you did you end up with multiple values per group. You have to apply a summary statistic to get some meaningful results.

Here a small illustration using some dummy data:

df <- tibble(County = rep(c("Krapina-Zagorje", "Varaždin","Zagreb"), each = 3),
         Region = rep(c("North Croatia","North Croatia","Zagreb"), each = 3),
         Year = rep(2015:2017,3),
         GDP = 1:9)
ggplot(df, aes(x = Year, y = GDP, colour =Region, group = Region)) +  geom_line() + geom_point()

enter image description here

since you need only one value per group you have to summarise your data accordingly (I assume you're interested in the total sum per group):

ggplot(df, aes(x = Year, y = GDP, colour =Region, group = Region)) + stat_summary(fun = sum, geom = "line")

enter image description here

wici
  • 1,681
  • 1
  • 14
  • 21
0

Your data appears to be already grouped. You can simply do this:

ggplot(data,aes(x=Year,y=Total_GDP, color=Region)) + geom_line()

Are you sure your example is representative? You also have a typo in your data. There is only one line with North Croatia. note the blanks.

plot

ziggystar
  • 28,410
  • 9
  • 72
  • 124