I have two dataframes with different columns and rows lengths. I want to add two columns in the dataframe A from the dataframe B (columns X and Y) using the "region" column that is present in both dataframes. The figure shows how it looks like and how I would like to get (dataframe C, but can be add to dataframe A as well). I am just showing you the head, but it has a lot of species and the match should be done for all species and the region with x and Y columns. Could someone please help me? enter image description here
Asked
Active
Viewed 52 times
1
-
check out `dplyr::left_join` – Edo Nov 18 '21 at 20:40
-
Welcome. Please share your data.frames using ``dput()`` rather than images, as sharing them as images makes it very difficult for people to help. Thank you. – user438383 Nov 18 '21 at 20:40
-
Maybe [this](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) helps. – Andre Wildberg Nov 18 '21 at 20:44
2 Answers
0
You can use dplyr::left_join
to add the columns based on region
and then use select
to select only the columns you need.
library(dplyr)
dataframe_a %>%
left_join(dataframe_b, by = "region") %>%
select(species, region, X, Y)
There are a lot of ways to join two tables. I'm only showing you one, but you can find more material here

Edo
- 7,567
- 2
- 9
- 19
-
Thank you very much for your help!! I do appreciate the suggestions! I am beginner but it will make me learn a lot!! Tks everyone :) – Bruna Nov 18 '21 at 20:52
0
If you left_join
you won`t get the desired output:
data.frame(species=dfA$species, region=dfA$region, X=dfB$X, Y=dfB$Y)
species <- (rep("Aa achalensis",6))
region <- c("ABT", "AFG", "AGE", "AGS", "AGW", "ALA")
dfA <- data.frame(species, region)
region <- c("ABT", "AGE", "AGS", "AGW", "ALA", "ALB")
LEVEL3_NAM <- c("Alberta", "Argentina Northeast", "Argentina South", "Argentina Northwest", "Alabama", "Albania")
LEVEL2_COD <- c(71, 85, 85, 85,78, 13)
LEVEL1_COD <- c(7,8,8,8,7,1)
X <- c(1,2,3,4,5,6)
Y <- c(7,8,9,10,11,12)
dfB <- data.frame(region, LEVEL3_NAM, LEVEL2_COD, LEVEL1_COD, X, Y)
left_join(dfA, dfB, by="region") %>%
select(species, region, X, Y)
data.frame(species=dfA$species, region=dfA$region, X=dfB$X, Y=dfB$Y)
With left_join
species region X Y
1 Aa achalensis ABT 1 7
2 Aa achalensis AFG NA NA
3 Aa achalensis AGE 2 8
4 Aa achalensis AGS 3 9
5 Aa achalensis AGW 4 10
6 Aa achalensis ALA 5 11
With data.frame(species=dfA$species, region=dfA$region, X=dfB$X, Y=dfB$Y)
species region X Y
1 Aa achalensis ABT 1 7
2 Aa achalensis AFG 2 8
3 Aa achalensis AGE 3 9
4 Aa achalensis AGS 4 10
5 Aa achalensis AGW 5 11
6 Aa achalensis ALA 6 12

TarJae
- 72,363
- 6
- 19
- 66
-
1
-
OP’s request was to use Region as joining variabile, which doesn’t happen here. Most connections happen “by chance” because the data frames are in the same order. This gets the desired output, however the desired output itself looks off to me. Why would you assign Argentina’s data to Afganistan? That could lead to a serious data quality issue. TarJae, your answer is correct based on the request that is being made, however @Bruna you should check your request again, imo. – Edo Nov 23 '21 at 07:09