0

I have two datasets like this:

  1. Population by country
Country  Population

America      value
Argentina    value
Australia    value
Brazil       value
Japan        value
...
  1. Landmass by country
Country    Landmass

Argentina   value
Mexico      value
Uruguay     value
Maldives    value
...

The number of rows and the number of country names are different from each set, is there a way to combine both data, (adding Landmass column to the respective country in the population set) it doesn't matter if that country isn't present in the population set, only combine to the ones that are there.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
MakeTheErrorDie
  • 87
  • 1
  • 1
  • 10
  • Please provide an actual example. Do not usethe word value to stand in. Also show it as the output of dput(X) where X is the input. See the guidance at the top of the [tag:r] tag page. – G. Grothendieck Aug 11 '21 at 15:32
  • 1
    This would easily have been accomplished with `merge`. – IRTFM Aug 11 '21 at 15:34

1 Answers1

1

I think what you're after is a left_join which from the docs:

https://dplyr.tidyverse.org/reference/join.html

returns all rows from x, and all columns from x and y. i.e.

pops <- data.frame(
  "Country"  = c("America", "Argentina", "Australia","Brazil", "Japan"),
  "Population" = seq(100, 200, 25)
)

landmass <- data.frame(
  "Country"  = c("Argentina", "Mexico", "Uruguay","Maldives"),
  "Landmass" = seq(1250, 2000, 250)
)

dplyr::left_join(pops, landmass, by = c("Country"= "Country"))

yields

joined table

dcurrie27
  • 319
  • 3
  • 14