So I have two dataframes each from a different year. The second year df has additional rows as well as updated values in some cells. My goal here is to get a new dataframe which shows just the things that have been added or changed, everything else can be 0,NA or removed.
Take a look at df 1 (y1):
project_ID sequence item q1 q2 q3 q4
NA NA NA NA NA NA NA
NA 207 period 201h 202h 203h 204h
NA 222 prepayment 1202 202.3 99 2922
2455 271 prepayment_2 1000 1000 1000 1000
2929 780 UPS 50 51 52 53
NA NA NA NA NA NA NA
So now a year goes by and I have this new data set (y2), note the differing rows and certain changed values.
project_ID sequence item q1 q2 q3 q4
NA NA NA NA NA NA NA
NA 207 period 201h 202h 203h 204h
NA 222 prepayment 1202 202.3 99 2922
2455 271 prepayment_2 999 999 1002 1000
3002 299 payment 500 500 500 500
2929 780 UPS 50 51 52 53
NA NA NA NA NA NA NA
So I attempted to use the compare() function in library(compare) however this does not have the functionality I am looking for, to my knowledge.
cmp<- compare(df1,df2)
cmp$tM
but this doesn't really help me especially since the rows are different. Furthermore it would only tell me which are different without calculating the differences.
So what I would like to see is a new dataframe which looks like this:
project_ID sequence item q1 q2 q3 q4
2455 271 prepayment_2 -1 -1 2
3002 299 payment 500 500 500 500
Now this is the best way I can think of doing it but at it's core I just need a new df with just what has changed and the difference of the values that have changed; spacing is not that important and if it would be easier to lay it out differently I'm all ears.
EDIT: Here are the two df's for R.
y1<- structure(list(project_ID = c("NA", "NA", "NA", "2455", "2929",
"NA"), sequence = c("NA", "207", "222", "271", "780", "NA"),
item = c("NA", "period", "prepayment", "prepayment_2", "UPS",
"NA"), q1 = c("NA", "201h", "1202", "1000", "50", "NA"),
q2 = c("NA", "202h", "202.3", "1000", "51", "NA"), q3 = c("NA",
"203h", "99", "1000", "52", "NA"), q4 = c("NA", "204h", "2922",
"1000", "53", "NA")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
y2 <- structure(list(project_ID = c("NA", "NA", "NA", "2455", "3002",
"2929", "NA"), sequence = c("NA", "207", "222", "271", "299",
"780", "NA"), item = c("NA", "period", "prepayment", "prepayment_2",
"payment", "UPS", "NA"), q1 = c("NA", "201h", "1202", "999",
"500", "50", "NA"), q2 = c("NA", "202h", "202.3", "999", "500",
"51", "NA"), q3 = c("NA", "203h", "99", "1002", "500", "52", "NA"
), q4 = c("NA", "204h", "2922", "1000", "500", "53", "NA")), row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))