So I have two large (in some part different) data frames, first is named sifrant and consist of:
IDP PREDMET ID NAME
1 120_1 ustanove _18 Barbara
2 120_1 ustanove _301 Mojca
3 200_1 zakonod _954 Klara
And other data frame is dalled baza
predmet sifpredm izvajalec1 sifizv1 izvajalec2 sifizv2 izvajalec3 sifizv3 izvajalec4 sifizv4 izvajalec5 sifizv5
ustanove 120_1 Barbara _18 Mojca _301 Klara _954 -2 -2 -2 -2
ustanove 120_1 Barbara _18 Mojca _301 -2 -2 -2 -2 -2 -2
Boath data frames should have equal values in next colums: IDP==sifpredm
and ID==sifizv1, siizv2, sifizv3, and so on
.
Now I would like to match this columns
IDP with sifpredm
and
ID with sifizv1 and sifizv2 and sifizv3
and so on.
Based on match, I would like to print IDP, ID and NAME
that do not have match in sifrant (i.e., is missing in sifrant and is present in baza)
Comparing two data frames I can see that in data frame sifrant
lecturer Klara is missing IDP 120_1. So my output should print which sifpredm, sifizv and izvajalec are in baza
but not in sifrant
. In my case sifrant is missing class with ID number 120_1, lecturer ID number _954, lecturer name Klara
So far I have tried
merge(sifrant, baza, by.x = "IDP",by.y="sifpredm" , all = T)
and
require(sqldf)
sifrantNotInbaza <- sqldf('SELECT * FROM baza EXCEPT SELECT * FROM sifrant')
and
library(diffobj)
install.packages("diffobj")
diffPrint(sifrant, baza)
diffObj(sifrant, baza)
I've also looked here
Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2
Here is also dput of both data frames:
structure(list(IDP = c("120_1", "120_1"), PREDMET = c("ustanove",
" ustanove"), ID = c("_18", "_301"), NAME = c("Barbara", "Mojca "
)), .Names = c("IDP", "PREDMET", "ID", "NAME"), class = "data.frame", row.names = c(NA,
-2L))
structure(list(predmet = c("ustanove", "ustanove"), sifpredm = c("120_1",
"120_1"), izvajalec1 = c("Barbara ", "Barbara "), sifizv1 = c("_18",
"_18"), izvajalec2 = c("Mojca ", "Mojca "), sifizv2 = c("_301",
"_301"), izvajalec3 = c("Klara ", "-2"), sifizv3 = c("_954",
"-2"), izvajalec4 = c(-2L, -2L), sifizv4 = c(-2L, -2L), izvajalec5 = c(-2L,
-2L), sifizv5 = c(-2L, -2L)), .Names = c("predmet", "sifpredm",
"izvajalec1", "sifizv1", "izvajalec2", "sifizv2", "izvajalec3",
"sifizv3", "izvajalec4", "sifizv4", "izvajalec5", "sifizv5"), class = "data.frame", row.names = c(NA,
-2L))