0

I have 2 datasets like this:

df1 <- data.frame(name=c(harry, ron, hermione),
             surname = c("potter","weasley","granger")
             )
df2 <- data.frame(surname = c("alton", "alton", "alves",
"weasley","weasley", "weasley, "potter",
"potter", "bell", "granger"),
             house=c("Gryffindor","Ravenclaw", "Hufflepuff","Gryffindor", "Slytherin", "Gryffindor",
"Hufflepuff", "Gryffindor", "Slytherin", "Ravenclaw")
             )

For each person in df1, I want to count the number of people with a common surname in df2 for each house separately. You can think of it as an attempt to trace each person's ancestors' history in Hogwarts by looking at the surnames in each House so far.

So, the outcome should be something like this:

df_new <- data.frame(surname = c("potter","weasley","granger"),
Hufflepuff= c(NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME),
Gryffindor= c(NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME), 
Slytherin= c(NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME), 
Ravenclaw= c(NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME,NUMBER OF COMMON SURNAME)
)

Thank you!

Rabia
  • 3
  • 2
  • 1
    Please don't post data as images. Take a look at how to make a [great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for ways of showing data. – Martin Gal Aug 10 '21 at 21:07

1 Answers1

1

For example, something like this:

library(tidyverse)

df1 <- data.frame(
  name = c("harry", "ron", "hermione"),
  surname = c("potter", "weasley", "granger")
)
df2 <- data.frame(
  surname = c(
    "alton",
    "alton",
    "alves",
    "weasley",
    "weasley",
    "weasley",
    "potter",
    "potter",
    "bell",
    "granger"
  ),
  house = c(
    "Gryffindor",
    "Ravenclaw",
    "Hufflepuff",
    "Gryffindor",
    "Slytherin",
    "Gryffindor",
    "Hufflepuff",
    "Gryffindor",
    "Slytherin",
    "Ravenclaw"
  )
)
left_join(df1, df2, by = "surname") %>%
  count(name, surname, house) %>%
  pivot_wider(names_from = house, values_from = n, values_fill = 0)
#> # A tibble: 3 x 6
#>   name     surname Gryffindor Hufflepuff Ravenclaw Slytherin
#>   <chr>    <chr>        <int>      <int>     <int>     <int>
#> 1 harry    potter           1          1         0         0
#> 2 hermione granger          0          0         1         0
#> 3 ron      weasley          2          0         0         1
crestor
  • 1,388
  • 8
  • 21