0

So I have two large (in some part different) data frames, first is named sifrant and consist of:

   IDP   PREDMET   ID    NAME
1 120_1  ustanove  _18 Barbara
2 120_1  ustanove _301  Mojca 
3 200_1  zakonod  _954  Klara

And other data frame is dalled baza

predmet sifpredm izvajalec1 sifizv1 izvajalec2 sifizv2 izvajalec3 sifizv3 izvajalec4 sifizv4 izvajalec5 sifizv5
ustanove    120_1   Barbara      _18     Mojca     _301     Klara     _954         -2      -2         -2      -2
ustanove    120_1   Barbara      _18     Mojca     _301         -2      -2         -2      -2         -2      -2

Boath data frames should have equal values in next colums: IDP==sifpredm and ID==sifizv1, siizv2, sifizv3, and so on.

Now I would like to match this columns IDP with sifpredm and ID with sifizv1 and sifizv2 and sifizv3 and so on.

Based on match, I would like to print IDP, ID and NAME that do not have match in sifrant (i.e., is missing in sifrant and is present in baza)

Comparing two data frames I can see that in data frame sifrant lecturer Klara is missing IDP 120_1. So my output should print which sifpredm, sifizv and izvajalec are in baza but not in sifrant. In my case sifrant is missing class with ID number 120_1, lecturer ID number _954, lecturer name Klara

So far I have tried

    merge(sifrant, baza, by.x = "IDP",by.y="sifpredm" , all = T)

and

 require(sqldf)
    sifrantNotInbaza <- sqldf('SELECT * FROM baza EXCEPT SELECT * FROM sifrant')

and

library(diffobj)
install.packages("diffobj")
diffPrint(sifrant, baza)
diffObj(sifrant, baza)

I've also looked here

Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

Here is also dput of both data frames:

structure(list(IDP = c("120_1", "120_1"), PREDMET = c("ustanove", 
" ustanove"), ID = c("_18", "_301"), NAME = c("Barbara", "Mojca "
)), .Names = c("IDP", "PREDMET", "ID", "NAME"), class = "data.frame", row.names = c(NA, 
-2L))


structure(list(predmet = c("ustanove", "ustanove"), sifpredm = c("120_1", 
"120_1"), izvajalec1 = c("Barbara ", "Barbara "), sifizv1 = c("_18", 
"_18"), izvajalec2 = c("Mojca ", "Mojca "), sifizv2 = c("_301", 
"_301"), izvajalec3 = c("Klara ", "-2"), sifizv3 = c("_954", 
"-2"), izvajalec4 = c(-2L, -2L), sifizv4 = c(-2L, -2L), izvajalec5 = c(-2L, 
-2L), sifizv5 = c(-2L, -2L)), .Names = c("predmet", "sifpredm", 
"izvajalec1", "sifizv1", "izvajalec2", "sifizv2", "izvajalec3", 
"sifizv3", "izvajalec4", "sifizv4", "izvajalec5", "sifizv5"), class = "data.frame", row.names = c(NA, 
-2L))
Community
  • 1
  • 1
Miha
  • 2,559
  • 2
  • 19
  • 34

1 Answers1

1

I think you need to correct your dput for sifrant. I copied in the table you've got in your description instead. Then I would do something like:

library(tidyverse)
baza2 <- baza %>% 
           gather(contains("sifizv"), 
                key = "sifizv", 
                value = "ID"
                ) %>% 
           select(1:2,ID) %>% 
           unique()
output <- anti_join(baza2, sifrant, by = c("sifpredm" = "IDP", "ID"))

This tells you the ID of the missing person. You can then look up their name by using match or:

left_join(output, sifrant) %>% select(sifpredm, predmet, ID, NAME)
biomiha
  • 1,358
  • 2
  • 12
  • 25
  • Yes I've missed one row in dput. thx. But your output gives `IDP PREDMET ID NAME 1 200_1 zakonod _954 Klara` but my outpus should be ID of class that is not in sifrant so it should be IDP PREDMET ID NAME 1 120_1 ustanove _954 Klara` – Miha Oct 19 '16 at 21:10
  • No, the result is the same, I stil get `200_1` .... So in my example Klara with IDP 120_1 is in `baza` but not in `sifrant`, so my output should look like this:`120_1 ustanove _954 Klara` because this lecture is not in `sifrat` but is in `baza`. – Miha Oct 19 '16 at 22:06
  • Edited again. Hopefully it should reflect what you need now. – biomiha Oct 20 '16 at 07:27
  • Malenkost stari! lp – biomiha Oct 20 '16 at 09:05
  • Haha :). Legenda :). – Miha Oct 20 '16 at 10:14