I want to create a new variable subtracting two variables from two different datasets. But I need that R make the difference only for the values that are referred to the same thing... The "same thing" in my case is the a third variable COD_PROV
which indicates the city.
I need that R make the difference between the Real Wage
variable of dataset 1 and the Real Wage
variable of dataset 2, but only if the COD_PROV
of these real wages is the same.
This is an example of the dataset 1
COD_PROV Real wage
1 1962,18
6 1742,85
5 1541,81
96 1612,2
4 1574
3 1823,53
103 1584,49
2 1666,21
7 1747,81
10 2066,42
8 1498,01
11 1871,34
9 1770,41
15 2240,03
16 1729,17
17 1773,38
13 1832,57
Datset 2 has the same framework, but some values of COD_PROV are missing
COD_PROV Real wage
1 4962,18
6 1542,85
5 3541,81
4 1564
3 1223,53
2 1446,21
7 1557,81
10 2226,42
8 1458,01
11 1843,34
16 1439,17
17 1883,38
13 1992,57
I've tried this
new <- mutate( dataset1, `Wage Difference ` = dataset1$`Real wage` - dataset2$`Real wage` )
but R of course replies
Error in mutate():
ℹ In argument: Wage difference = ... - dataset1$Real wage.
Caused by error:
! Wage difference must be size 105 or 1, not 106.
Run rlang::last_error() to see where the error occurred.
I suppose that the reason is that dataset 2 has less observations than dataset1 ( in particular some values of COD_PROV are missing)... How can I apply the difference only for the same values of COD_PROV ?