1

I have two dataframes with different columns and rows lengths. I want to add two columns in the dataframe A from the dataframe B (columns X and Y) using the "region" column that is present in both dataframes. The figure shows how it looks like and how I would like to get (dataframe C, but can be add to dataframe A as well). I am just showing you the head, but it has a lot of species and the match should be done for all species and the region with x and Y columns. Could someone please help me? enter image description here

Bruna
  • 13
  • 2

2 Answers2

0

You can use dplyr::left_join to add the columns based on region and then use select to select only the columns you need.

library(dplyr)
dataframe_a %>%
  left_join(dataframe_b, by = "region") %>%
  select(species, region, X, Y)

There are a lot of ways to join two tables. I'm only showing you one, but you can find more material here

Edo
  • 7,567
  • 2
  • 9
  • 19
  • Thank you very much for your help!! I do appreciate the suggestions! I am beginner but it will make me learn a lot!! Tks everyone :) – Bruna Nov 18 '21 at 20:52
0

If you left_join you won`t get the desired output:

data.frame(species=dfA$species, region=dfA$region, X=dfB$X, Y=dfB$Y)
species <- (rep("Aa achalensis",6))
region <- c("ABT", "AFG", "AGE", "AGS", "AGW", "ALA")
dfA <- data.frame(species, region)

region <- c("ABT", "AGE", "AGS", "AGW", "ALA", "ALB")
LEVEL3_NAM <- c("Alberta", "Argentina Northeast", "Argentina South", "Argentina Northwest", "Alabama", "Albania")
LEVEL2_COD <- c(71, 85, 85, 85,78, 13)
LEVEL1_COD <- c(7,8,8,8,7,1)
X <- c(1,2,3,4,5,6)
Y <- c(7,8,9,10,11,12)
dfB <- data.frame(region, LEVEL3_NAM, LEVEL2_COD, LEVEL1_COD, X, Y)

left_join(dfA, dfB, by="region") %>% 
  select(species, region, X, Y)

data.frame(species=dfA$species, region=dfA$region, X=dfB$X, Y=dfB$Y)

With left_join

       species region  X  Y
1 Aa achalensis    ABT  1  7
2 Aa achalensis    AFG NA NA
3 Aa achalensis    AGE  2  8
4 Aa achalensis    AGS  3  9
5 Aa achalensis    AGW  4 10
6 Aa achalensis    ALA  5 11

With data.frame(species=dfA$species, region=dfA$region, X=dfB$X, Y=dfB$Y)

        species region X  Y
1 Aa achalensis    ABT 1  7
2 Aa achalensis    AFG 2  8
3 Aa achalensis    AGE 3  9
4 Aa achalensis    AGS 4 10
5 Aa achalensis    AGW 5 11
6 Aa achalensis    ALA 6 12
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Thank you very much #TarJae! – Bruna Nov 22 '21 at 22:13
  • OP’s request was to use Region as joining variabile, which doesn’t happen here. Most connections happen “by chance” because the data frames are in the same order. This gets the desired output, however the desired output itself looks off to me. Why would you assign Argentina’s data to Afganistan? That could lead to a serious data quality issue. TarJae, your answer is correct based on the request that is being made, however @Bruna you should check your request again, imo. – Edo Nov 23 '21 at 07:09