9

I would like to plot a continuous line between different groups of data in ggplot. My dataset has three groups of data 1, 2 or 3.

ggplot(Dataset_v2.2.clusterplot, aes(Date, Number_IPOs, color = factor(label)))+
  geom_point()+
  geom_line()

enter image description here

My problem is that for the group 2 in green, the data is spread in different time intervals and for this reason, ggplot connects them. Indeed, ggplot makes out my data three lines based on the color command, where initially there was only one line as follows:

enter image description here

Is there a way to color the line based on the three groups without interruptions between groups and interconnections within a group (when they are spread)?

And here is the data:

Dataset_v2.2.clusterplot <- structure(list(Date = structure(c(959817600, 962409600, 965088000, 
                                  967766400, 970358400, 973036800, 975628800, 978307200, 980985600, 
                                  983404800, 986083200, 988675200, 991353600, 993945600, 996624000, 
                                  999302400, 1001894400, 1004572800, 1007164800, 1009843200, 1012521600, 
                                  1014940800, 1017619200, 1020211200, 1022889600, 1025481600, 1028160000, 
                                  1030838400, 1033430400, 1036108800, 1038700800, 1041379200, 1044057600, 
                                  1046476800, 1049155200, 1051747200, 1054425600, 1057017600, 1059696000, 
                                  1062374400, 1064966400, 1067644800, 1070236800, 1072915200, 1075593600, 
                                  1078099200, 1080777600, 1083369600, 1086048000, 1088640000, 1091318400, 
                                  1093996800, 1096588800, 1099267200, 1101859200, 1104537600, 1107216000, 
                                  1109635200, 1112313600, 1114905600, 1117584000, 1120176000, 1122854400, 
                                  1125532800, 1128124800, 1130803200, 1133395200, 1136073600, 1138752000, 
                                  1141171200, 1143849600, 1146441600, 1149120000, 1151712000, 1154390400, 
                                  1157068800, 1159660800, 1162339200, 1164931200, 1167609600, 1170288000, 
                                  1172707200, 1175385600, 1177977600, 1180656000, 1183248000, 1185926400, 
                                  1188604800, 1191196800, 1193875200, 1196467200, 1199145600, 1201824000, 
                                  1204329600, 1207008000, 1209600000, 1212278400, 1214870400, 1217548800, 
                                  1220227200, 1222819200, 1225497600, 1228089600, 1230768000, 1233446400, 
                                  1235865600, 1238544000, 1241136000, 1243814400, 1246406400, 1249084800, 
                                  1251763200, 1254355200, 1257033600, 1259625600, 1262304000, 1264982400, 
                                  1267401600, 1270080000, 1272672000, 1275350400, 1277942400, 1280620800, 
                                  1283299200, 1285891200, 1288569600, 1291161600, 1293840000, 1296518400, 
                                  1298937600, 1301616000, 1304208000, 1306886400, 1309478400, 1312156800, 
                                  1314835200, 1317427200, 1320105600, 1322697600, 1325376000, 1328054400, 
                                  1330560000, 1333238400, 1335830400, 1338508800, 1341100800, 1343779200, 
                                  1346457600, 1349049600, 1351728000, 1354320000, 1356998400, 1359676800, 
                                  1362096000, 1364774400, 1367366400, 1370044800, 1372636800, 1375315200, 
                                  1377993600, 1380585600, 1383264000, 1385856000, 1388534400, 1391212800, 
                                  1393632000, 1396310400, 1398902400, 1401580800, 1404172800, 1406851200, 
                                  1409529600, 1412121600, 1414800000, 1417392000, 1420070400, 1422748800, 
                                  1425168000, 1427846400, 1430438400, 1433116800, 1435708800, 1438387200, 
                                  1441065600, 1443657600, 1446336000, 1448928000, 1451606400, 1454284800, 
                                  1456790400, 1459468800, 1462060800, 1464739200, 1467331200, 1470009600, 
                                  1472688000, 1475280000, 1477958400, 1480550400, 1483228800, 1485907200, 
                                  1488326400, 1491004800, 1493596800, 1496275200), class = c("POSIXct", 
                                                                                             "POSIXt"), tzone = "UTC"), Number_IPOs = c(35, 42, 67, 25, 23, 
                                                                                                                                        22, 8, 3, 9, 8, 3, 10, 15, 8, 5, 0, 8, 13, 12, 1, 9, 7, 8, 15, 
                                                                                                                                        9, 6, 1, 0, 10, 10, 6, 0, 4, 1, 0, 2, 3, 7, 7, 6, 12, 16, 22, 
                                                                                                                                        5, 19, 16, 12, 15, 28, 28, 17, 15, 31, 16, 30, 9, 25, 7, 10, 
                                                                                                                                        14, 27, 23, 35, 17, 17, 20, 22, 16, 22, 17, 21, 17, 20, 10, 9, 
                                                                                                                                        15, 26, 28, 36, 10, 18, 16, 17, 34, 24, 20, 16, 7, 33, 38, 23, 
                                                                                                                                        12, 7, 5, 5, 7, 3, 4, 3, 0, 0, 1, 0, 0, 1, 0, 3, 3, 6, 2, 5, 
                                                                                                                                        8, 14, 12, 7, 6, 8, 13, 14, 14, 11, 11, 12, 7, 22, 21, 21, 7, 
                                                                                                                                        19, 8, 21, 24, 13, 16, 4, 0, 3, 17, 11, 3, 19, 22, 18, 12, 4, 
                                                                                                                                        12, 7, 8, 24, 9, 8, 11, 9, 14, 14, 28, 19, 19, 20, 22, 32, 29, 
                                                                                                                                        13, 18, 23, 29, 29, 21, 31, 36, 14, 14, 32, 26, 15, 10, 15, 5, 
                                                                                                                                        14, 19, 26, 22, 10, 6, 16, 12, 2, 0, 5, 4, 7, 15, 7, 13, 8, 17, 
                                                                                                                                        21, 2, 7, 9, 8, 13, 20, 20, 21), label = c(3L, 3L, 3L, 3L, 3L, 
                                                                                                                                                                                   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 
                                                                                                                                                                                   3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
                                                                                                                                                                                   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                                                                                                                                                                   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                                                                                                                                                                   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                                                                                                                                                                   3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                                                                                                                                                                   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
                                                                                                                                                                                   1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                   1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("Date", "Number_IPOs", 
                                                                                                                                                                                                                                "label"), class = "data.frame", row.names = c(NA, -205L))
Michael Harper
  • 14,721
  • 2
  • 60
  • 84
SCW16
  • 403
  • 2
  • 4
  • 10
  • 1
    Yes it has! Thank you for this straightforward and easily implementable solution! I think I used factors unconciously, because once I had data in a format not suitable for the color function and it just worked when I used factors.. – SCW16 Nov 28 '17 at 18:45

1 Answers1

22

geom_path is more suited for where you are trying to connect points: http://ggplot2.tidyverse.org/reference/geom_path.html. As explained there:

"geom_path connects the observations in the order in which they appear in the data."

I have found it to be a bit funny with factors, as you have here. As an example of what you might expect the function to look like:

ggplot(Dataset_v2.2.clusterplot, aes(Date, Number_IPOs, color = factor(label)))+
  geom_path() +
  geom_point()

enter image description here

An easy way to fix this is to specify a group for the line within the aes for geom_path. As you want a continuous line, I just specified 1 (this value is arbitrary, and could be any number)

ggplot(Dataset_v2.2.clusterplot, aes(Date, Number_IPOs, color = factor(label)))+
  geom_path(aes(group = 1)) +
  geom_point()

enter image description here

Michael Harper
  • 14,721
  • 2
  • 60
  • 84