I would like to ask how to perform left joint operation with 2 datasets when variable to joint by has different string.
I know it can be done "by-hand" but I would prefer more efficient approach
I provided 2 datasets in HERE.
Here are some rows from each data set.
all_data
GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1 GID_2 provstate VARNAME_2 NL_NAME_2 TYPE_2 ENGTYPE_2 CC_2
1 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.1_1 Ain <NA> <NA> Département Department 01
2 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.2_1 Allier Basses-Alpes <NA> Département Department 03
3 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.3_1 Ardèche <NA> <NA> Département Department 07
4 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.4_1 Cantal <NA> <NA> Département Department 15
5 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.5_1 Drôme <NA> <NA> Département Department 26
6 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.6_1 Haute-Loire <NA> <NA> Département Department 43
7 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.7_1 Haute-Savoie <NA> <NA> Département Department 74
8 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.8_1 Isère <NA> <NA> Département Department 38
9 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.9_1 Loire <NA> <NA> Département Department 42
10 FRA France FRA.1_1 Auvergne-Rhône-Alpes <NA> FRA.1.10_1 Puy-de-Dôme <NA> <NA> Département Department 63
data_to_insert
provstate count
<chr> <int>
1 Alpes-Maritimes 13
2 Alsace 13
3 Aquitaine 119
4 Aude 1
5 Auvergne 1
6 Auvergne-Rhone-Alpes 10
7 Bas-Rhin 5
8 Bouches-du-Rhone 9
9 Bourgogne-Franche-Comte 6
10 Brittany 23
I need to connect datasets that dataframe called all_data will contain new variable called count that is found in data_to_insert dataframe - joint should be based on provstate variable. (provstatet that do not have values in data_to_insert dataframe should have value 0 in new left joined dataframe)
However names are slightly different in both dataset.
I would like to ask for any advice