-1

I have a very basic question, but I dont really know how this can happen:

I have two tibbles/data.frames called "data1" and "data2". Now, I would like to keep "data1" and add all values in "data2" where the columns "variable1" and "variable2" (which appear in both tibbles) are identical. Therefore I do a left_join:

library(dplyr)
newData <- left_join(data1, data2, by=c("variable1", "variable"))

However, If I check the number of rows, nrow(data1) is smaller than nrow(newData). How can this be or why does this happen? Why does the number of cases increase?

D. Studer
  • 1,711
  • 1
  • 16
  • 35
  • 1
    Please show us input datasets and this *loss of cases*. As of now, we'll take your word for it. – Parfait Sep 21 '17 at 18:43
  • Excuse me, it's not a LOSS. Its an INCREASE: from 17116 to 17139 cases. Unfortunately, I cannot show the original data as it is secret. – D. Studer Sep 21 '17 at 18:50
  • Consider editing even the title as that is a very important overlook. And see this [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Parfait Sep 21 '17 at 18:54
  • Do you have variables that are `NA` in `variable1` and `variable2`? – Benjamin Sep 21 '17 at 18:56
  • Just checked it... there is only one NA in data2$variable2. – D. Studer Sep 21 '17 at 18:59

1 Answers1

1

it just means that data2 has multiple values for some entries in data1.

Mouad_Seridi
  • 2,666
  • 15
  • 27