Here’s the situation :
I have 2 datasets containing the monthly prices of different financial products between 02/2013 and 09/2019 when they are available. These datasets have 47 observations (financial product IDs) and 80 variables (months).
Dataset 1 contains less data for this period, but the quality of the data is better.
The first part of my problem (which I have resolved) was to create a dataframe containing all available data from Dataset 1, and adding data available from Dataset 2 when it is absent from Dataset 1 to a new Dataframe that contains both all the quality data from Dataset 1 as well as supplementary data from dataset2.
Now, I am trying to illustrate this process through ggplot.
I have plotted the missing data separately for each dataset using
(1) missingness maps from the Amelia package
missmap(df,rank.order = FALSE,col = c("white","black"))
(2) md.pattern from the Mice package
md.pattern(df)
I would like to plot my final dataset containing both types of data in one of these formats, using a color code to clearly show data from dataset 2 having been added to data from dataset 1. Is this possible?
Here are subsets of both datasets:
dput(df1)
structure(list(`201811` = c(NA, NA, NA, NA, 95.5237185244587,
NA, 97.5075873015873, NA, NA, NA), `201812` = c(NA, NA, NA, NA,
95.2207352941176, NA, 98.6600228310502, NA, NA, NA), `201901` = c(NA,
NA, NA, NA, 93.1981693949331, NA, 100.441459234609, NA, NA, 98.789
), `201902` = c(NA, NA, NA, NA, 98.1906626506024, NA, 100.144885961747,
NA, NA, 99.029), `201903` = c(NA, 101.376, NA, NA, 100.10447592068,
NA, 100.95874067937, NA, 103.374571428571, 99.743), `201904` = c(NA,
101.966785714286, NA, NA, 101.686565217391, NA, 100.711654559226,
NA, 103.411, 99.517)), row.names = c("929043AH0", "75884RAT0",
"62943WAA7", "88104LAA1", "62943WAB5", "268317AS3", "037833BU3",
"88104LAB9", "25389JAL0", "865622BY9"), class = "data.frame")
dput(df2)
structure(list(`201811` = c(97.069, 93.375, 99.8809, 94.576,
99.849, 96.551, 93.5, 94.8075, 88.8982, 92.8731), `201812` = c(97.638,
93.75, 99.9679, 94.613, 99.831, 96.692, 93.375, 94.8904, 89.1294,
93.293), `201901` = c(98.506, 94.924, 99.9968, 96.488, 100.962,
97.371, 93.75, 97.6666, 91.3518, 98.2993), `201902` = c(100.026,
97.289, 99.9968, 96.92, 101.194, 97.274, 97.125, 97.8991, 93.3958,
97.7391), `201903` = c(99.779, 96.78, 99.9968, 96.919, 101.315,
97.691, 97.7515, 98.1629, 93.0283, 97.8553), `201904` = c(100.665,
98.971, 99.9968, 98.289, 102.869, 98.402, 98.2492, 99.4818, 95.7858,
100.6429)), row.names = c("929043AH0", "75884RAT0", "62943WAA7",
"88104LAA1", "62943WAB5", "268317AS3", "037833BU3", "88104LAB9",
"25389JAL0", "865622BY9"), class = "data.frame")