I have a central data frame of information (df3) that I'm trying to subset and add columns to based on data extracted from several columns of another (df2), that itself comes from a subset of a third (df1). I've managed to get so far by searching help and playing around with various functions, but I have reached an impasse. I do hope you can help.
To begin with, the 3dfs are structured as follows:
#df1 - my initial search database
id <- c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8")
yesno <- c("Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No")
city <- c("London", "London", "Paris", "London", "Paris", "New York", "London", "London")
df1 <- cbind(id, yesno, city)
df1 <- as.data.frame(df1)
df1
#df2 - containing the data needed to search df3, but situated across columns
id <- c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8")
twitter <- c("@one","", "@three", "@four", "", "", "@seven", "")
email <- c("", "", "", "add4", "add5","", "add7", "")
mail <- c("", "postcode2", "", "","","","","postcode8")
df2 <- cbind(id, twitter, email, mail)
df2 <- as.data.frame(df2)
df2
#df3 - the central df containing the data I wish to extract
comms <- c("@one", "postcode2", "@three", "@four", "add4", "add5", "six" "@seven", "add7", "postcode2")
target <- c("text1", "text2", "text3", "text4.1", "text4.2", "text5", "text6", "text7.1","text7.2", "text8")
df3 <- cbind(comms,target)
df3 <- as.data.frame(df3)
df3
The commonality between df1 and df2 is found in the id columns. I've so far been able to filter df1 and extract the ids, which I've then used to subset df2.
df_search <- df1 %>%
filter(yesno == "Yes", city == "London")
df_search_ids <- df_search$id
df2_search <- df2 %>%
filter(id %in% df_search_ids)
df2_search
id twitter email mail
1 id1 @one
2 id2 postcode2
3 id4 @four add4
4 id7 @seven add7
My problems are: the common data between df2 and df3 are spread across three different columns of df2 (twitter, email and mail); these columns contain blank cells and other extraneous info (e.g. 'I am not on Twitter'); and finally that some of the entries in df2 (such as id4 and id7 above) have more than one entry in df3.
The solution I am trying to reach is that I would like to extract all instances from the columns twitter, email and mail of df2 based on a match with the ids extracting from df1, so that the extracted info can then be applied to subset df3 and eventually results in a new df(target_res) that looks like this:
id_res <- c("id1", "id2", "id4", "id4", "id7", "id7")
comms_res <- c("@one", "postcode2", "@four", "add4", "@seven", "add7")
target_res <- c("text1", "text2", "text4.1", "text4.2", "text7.1", "text7.2")
result_df <- cbind(id_res, comms_res, target_res)
result_df <- as.data.frame(result_df)
result_df
id_res comms_res target_res
1 id1 @one text1
2 id2 postcode2 text2
3 id4 @four text4.1
4 id4 add4 text4.2
5 id7 @seven text7.1
6 id7 add7 text7.2
This is an action I will be performing a number of times (based on different explorations of df1), so ideally would be replicable.
I hope this is a clear explanation of the issue.