2

I have 3 tables and want to compare then two by two and find which element is missing.

My tables are:

> BaseIda
  Id Quant
1  1     a
2  2     b
3  3     c
4  4     d
5  5     e
6  6     f
7  7     g
> IdaEmpA
  RespA QuantA
1     1     11
2     2     13
3     3     15
4     4      3
5     5     18
6     6      1
7     7      1
> IdaEmpB
  RespB QuantB
1     1     18
2     2     14
3     3     21
4     4      2
5     6     13
6     7      3

I need to compare BaseIda$Id with IdaEmpA$RespA and IdaEmpB$RespB and after that, point which value is missing, considering BaseIda$Id Always have all values. I found the post below usefull but could not manage to make it give my answer: Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2

I tried this:

comparacaoA <- compare(BaseIda$Id,IdaEmpA$RespA)
comparacaoB <- compare(BaseIda$Id,IdaEmpA$RespB)

I am not using allowAll=TRUE as I believed it was not necessary by Reading help file.

I am getting this result:

> comparacaoA
TRUE
> comparacaoB
FALSE

Which is correct as IdaEmpA$RespA have all data, while IdaEmpB$RespB is missing value 5.

But when I try to see which values were correct, I get this:

> comparacaoA$tM
[1] 1 2 3 4 5 6 7
> comparacaoB$tM
[1] 1 2 3 4 5 6 7

I had think it could be because of that allowAll=TRUE I did not use, so I tried again, using it, and got this:

comparacaoA <- compare(BaseIda$Id,IdaEmpA$RespA,allowAll=TRUE)
comparacaoB <- compare(BaseIda$Id,IdaEmpA$RespB,allowAll=TRUE)
> comparacaoA
TRUE
> comparacaoB
FALSE
  coerced from <NULL> to <integer>
  shortened model
  sorted
> comparacaoA$tM
[1] 1 2 3 4 5 6 7
> comparacaoB$tM
[1] 1

The expected return should be:

> comparacaoA$tM
[1] 1 2 3 4 5 6 7
> comparacaoB$tM
[1] 1 2 3 4 6 7

Can someone help me understand what am I missing? What am I doing wrong?

Community
  • 1
  • 1
Spartacus Rocha
  • 546
  • 1
  • 6
  • 14

2 Answers2

3

Regarding your code using the compare library, you simply have an error in your code. See the following:

comparacaoB <- compare(BaseIda$Id,IdaEmpA$RespB,allowAll=TRUE)

You need to change IdaEmpA$RespB to IdaEmpB$RespB and it will work fine.

compare(BaseIda$Id, IdaEmpB$RespB, allowAll=TRUE)$tM

However, there are many base solutions. Assuming the order is set in decreasing order as you show and BaseIda is completely sequential, you could simply use which.

BaseIda <- data.frame(Id=seq(7), Quant=letters[seq(7)])
IdaEmpA <- data.frame(RespA=seq(7), QuantA=c(11,13,15,3,18,1,1))
IdaEmpB <- data.frame(RespB=c(1:4, 6:7), QauntB=c(18,14,21,2,13,3))

which(BaseIda$Id %in% IdaEmpA$RespA)
[1] 1 2 3 4 5 6 7
which(BaseIda$Id %in% IdaEmpB$RespB)
[1] 1 2 3 4 6 7

Otherwise, a more general solution is to index the 'source' vector.

BaseIda$Id[BaseIda$Id %in% IdaEmpA$RespA]
BaseIda$Id[BaseIda$Id %in% IdaEmpB$RespB]

If you are looking for a function for which elements are the same you can also use intersect.

intersect(BaseIda$Id, IdaEmpA$RespA)
intersect(BaseIda$Id, IdaEmpB$RespB)
cdeterman
  • 19,630
  • 7
  • 76
  • 100
1

If you are only interested in finding the values that are different between BaseIda$Id and IdaEmpA$RespA or IdaEmpB$RespB, you can use the setdiff function on the vectors

NicE
  • 21,165
  • 3
  • 51
  • 68
  • `setdiff` will return which elements are different, not which are the same (which is what the OP is asking), the function you are likely looking for is `intersect`. – cdeterman Jan 27 '15 at 13:56
  • I think he also needs what is different according to what he wrote under his first code: _I need to compare BaseIda$Id with IdaEmpA$RespA and IdaEmpB$RespB and after that, **point which value is missing**, considering BaseIda$Id Always have all values._ Just unsure if what is different is his final goal or not – NicE Jan 27 '15 at 13:57
  • Perfect, using both answers I did all I needed. Both helpped me a lot. Many thanks! – Spartacus Rocha Jan 27 '15 at 14:06