0

I have a dataframe looks like this

df1
          x           y             Classification
    1   567610  5934630                0
    2   567630  5934630                0
    3   567530  5934610                0
    4   567492.7 5934585               0
    5   567493.3 5934585               0
    6   567492.3 5934584               0
    7   567492.8 5934584               0
    8   567590 5934610                 0

And another one

df2
     x       y     V1   
1  567610 5934630 16.153   
2  567630 5934630 20.450   
3  567530 5934610  1.175   

Expected output

df2
        x       y     V1      classification
    1  567610 5934630 16.153     0
    2  567630 5934630 20.450     0
    3  567530 5934610  1.175     0

I tired this but its not working

df2 %>% 
  rows_patch(semi_join(df1,df2, by = "x"))

Error in `semi_join()`:
! Input columns in `y` must be unique.
✖ Problem with `x`, `y`, and `V1`.
Run `rlang::last_error()` to see where the error occurred.

I want to compare the x and y in both the dataframe and if they match each other then get the classification to the df2. All of the x and y of df2 is from df1 so all of them will match. i just need the classification from df1 to df2.

Purple_Ad
  • 65
  • 6

3 Answers3

1

You could use a left_join instead of rows_patch like this:

library(dplyr)
df2 %>% 
  left_join(semi_join(df1,df2, by = "x"))
#> Joining with `by = join_by(x, y)`
#>        x       y     V1 Classification
#> 1 567610 5934630 16.153              0
#> 2 567630 5934630 20.450              0
#> 3 567530 5934610  1.175              0

Created on 2023-03-29 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
0

using dplyr

library(dplyr)

df2 %>%
  left_join(df1, by = c("x", "y"))

using data.table

library(data.table)
setDT(df1)
setDT(df2)

df2[df1, on = c("x", "y"), nomatch = 0]

results

        x       y     V1 Classification
1: 567610 5934630 16.153              0
2: 567630 5934630 20.450              0
3: 567530 5934610  1.175              0
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
0

You could connect both columns between your two dfs:

left_join(df2, df1, by=c('x'='x', 'y'='y'))
       x       y     V1 Classification
1 567610 5934630 16.153              0
2 567630 5934630 20.450              0
3 567530 5934610  1.175              0
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14