I would like to plot 5 lines of best fit summarising 5 groups of paired points (i.e. summarising the lines between the two points)
I have a data set structured like this (readings of yield and rainfall for 500 farms grouped into 5 production systems for 2 years. Each farm occurs twice in the data set 2015 and 2017):
FARM ID (cocoa$XACIB) // YEAR // RAINFALL (cocoa$X15_clim) // YIELD (cocoa$TY_15.1.16_HA) // SYSTEM (cocoa$Production_system)
1 // 2015 // 1200 // 500 // A
1 // 2017 // 1400 // 600 // A
2 // 2015 // 900 // 400 // B
Reproducible data set at bottom...
I want to create a scatter plot of RAINFALL (x axis) vs YIELD (y axis) with points grouped by SYSTEM and a line of best fit for each of the 5 SYSTEMS based on the paired change in YIELD for each ID (Farm).
Currently I have got stuck with, generating a graph of paired data but no trendline:
ggplot(cocoa, aes(x=cocoa$X15_clim, y=cocoa$TY_15.1.16_HA, color=cocoa$Production_system)) +
geom_point() +
geom_line(group=cocoa$XACIB)
paired data with lines but no trendlines
or stuck with a graph of grouped data with trendlines for production systems but not baed on the pairs based on all the data for each production system:
ggplot(cocoa, aes(x=cocoa$X15_clim, y=cocoa$TY_15.1.16_HA, color=cocoa$Production_system)) +
geom_point() +
geom_smooth(method=lm, se =FALSE)
trend lines for groups but not pairs...
Please could someone help me find a way to produce a trend line for each group based on the paired data?
MANY THANKS!
dput(cocoa)
structure(list(Production_system = structure(c(4L, 4L, 4L, 4L,
1L, 1L, 4L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 1L, 1L, 3L, 3L, 1L, 1L,
3L, 3L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 3L, 4L, 4L, 3L, 3L, 2L, 2L,
5L, 5L, 3L, 3L, 5L, 5L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L,
5L, 5L, 4L, 4L, 4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L
), .Label = c("A_CONV1", "B_UTZ", "C_ORG", "D_CONV2", "E_RA"), class = "factor"),
XACIB = c(22000, 22000, 30000, 30000, 35701, 35701, 42016,
42016, 44880, 44880, 46800, 46800, 48000, 48000, 56672, 56672,
62119, 62119, 65318.4, 65318.4, 76916, 76916, 79040, 79040,
82194, 82194, 85800, 85800, 90576, 90576, 90720, 90720, 92252.8,
92252.8, 93728, 93728, 94080, 94080, 105473, 105473, 105600,
105600, 108160, 108160, 108820, 108820, 111385.2, 111385.2,
112060, 112060, 113568, 113568, 116280, 116280, 119145, 119145,
120392, 120392, 128625, 128625, 131040, 131040, 135000, 135000,
138256, 138256, 143208, 143208), TY_15.1.16_HA = c(459.6153,
306.4102, NA, 905.3028636, NA, 229.80765, 229.80765, 612.8204,
NA, 15.95886458, 306.4102, 204.2734667, 344.711475, 689.42295,
229.80765, 383.01275, 306.4102, 229.80765, 102.1367333, 153.2051,
183.84612, 344.711475, 61.28204, 91.92306, 901.2064706, 1201.608627,
919.2306, 766.0255, 536.21785, 153.2051, 265.5555067, 306.4102,
NA, 229.80765, 95.7531875, 287.2595625, 510.6836667, 638.3545833,
127.6709167, 306.4102, 14.36297813, 153.2051, 612.8204, 1021.367333,
383.01275, 510.6836667, 87.54577143, 131.3186571, 200.3451308,
223.9151462, 107.24357, 199.16663, 868.1622333, 800.0710778,
465.289563, 635.5174519, 932.3624657, 875.4577143, 357.4785667,
510.6836667, 255.3418333, 153.2051, 218.8644286, 21.88644286,
103.8922084, 146.5023769, 282.8401846, 306.4102), X15_clim = c(1148.415157,
1413.089005, 1148.415157, 1413.089005, 1214.720448, 1257.496108,
1136.028355, 1383.640551, 1270.817029, 1297.849306, 1270.817029,
1297.849306, 1136.028355, 1383.640551, 1244.262682, 1281.788645,
1214.136308, 1279.60737, 1244.262682, 1281.788645, 1270.817029,
1297.849306, 1118.994984, 1350.821456, 1125.50238, 1370.241065,
1136.028355, 1383.640551, 1214.136308, 1279.60737, 1126.707093,
1391.304566, 1245.721294, 1307.936407, 1240.241637, 1267.070353,
1125.50238, 1370.241065, 1270.817029, 1297.849306, 1094.358686,
1346.225715, 1118.994984, 1350.821456, 1214.720448, 1257.496108,
1246.361514, 1266.872479, 1245.721294, 1307.936407, 1136.028355,
1383.640551, 1118.994984, 1350.821456, 1136.028355, 1383.640551,
1136.028355, 1383.640551, 1214.136308, 1279.60737, 1139.349273,
1397.517303, 1148.415157, 1413.089005, 1214.720448, 1257.496108,
1214.720448, 1257.496108)), class = "data.frame", row.names = c(NA,
-68L))