-1

I have 2000 lines of HR dataset and need to append column after grepping the string pattern. I want to match (sometimes they are not exact matches) edu column from df2 to df1, and print the respective Dep rows.

Also, when there is no match of edu pattern in df1, paste the same Dep string again instead of NA and vice versa (as in last 2 lines of expected result). Any suggestions? Thanks.

df2 <- data.frame(Dep = c("Research & Development", "Sales", "Research & Development", "Research & Development", "sales", "sales"), Edu = c("Life Sciences", "Marketing", "paramedical", "Other", "Technical studies","Business"))

df1 <- data.frame(Dep = c("Sales", "Sales", "Research & Development", "Research & Development", "Human Resources", "Research & Development", "legal section"), Edu = c("Life Sciences", "Marketing", "Medical", "Other", "Human Resources", "Technical", "Law"))

Expected output

        Dep_df1          Edu_df1_df2          Dep_df2
        Sales          Life Sciences          Research & Development 
        Sales          Marketing          Sales
        Research & Development          Medical          Research & Development 
        Research & Development          Other          Research & Development 
        Human Resources          Human Resources          Human Resources 
        Research & Development          Technical          Sales
        legal section          Law          legal section
        sales          Business           sales
Community
  • 1
  • 1
amar Par
  • 3
  • 3

2 Answers2

0

A possible way - join using dplyr. It will result in column names with .x and .y appended to same named columns.

library(dplyr)
df1 <- data.frame(Dep = c("S", "S", "R"), Edu = c("LS", "M", "O"))
df2 <- data.frame(Dep = c("G", "L", "Q"), Edu = c("LS", "M", "O"))

df2 %>% left_join(df1, by = c("Education")
UpsideDownRide
  • 528
  • 1
  • 4
  • 12
  • Tried, but it didn't work. Appends only NAs not the string. – amar Par Jul 12 '18 at 11:01
  • I have added reproducible data to my answer. If you would do the same to your question then it would be easier to understand what are you trying to do precisely. – UpsideDownRide Jul 12 '18 at 11:06
  • 1
    thanks. My data frame has 2000 entries and some of them has "Medi" instead of "Medical" and "Technical" instead of "Technical Degree" etc. So I have to match the string as they are not exactly the same length. And need to add the rows if they are missing in one of the input file (as in last 2 lines of expected result). How to include these too. I will update the question. Thanks. – amar Par Jul 12 '18 at 11:17
  • Again - reproducible example would be helpful. You don't need to provide all of the data but boil it down to the things that exemplify the problem you are dealing with. For what you describe two ideas are following: either fill in the names properly so that you can join, and then you could fill in the missing values. Other option is to write a function that would do to the operaiton you need and map it over the rows. But without the data it's too much of a hassle to grapple with it. – UpsideDownRide Jul 12 '18 at 11:23
  • Thanks again @ UpsideDownRide: edited my example. Filling the proper names is bit complicated but I am sure, there are partial string matches between the df1 and df2 edu columns. – amar Par Jul 12 '18 at 12:23
0

After some tries, this worked.

dd=merge(df1, df2[, c("Edu", "Dep")], by="Edu", all.x = TRUE) 
transform(dd, dep.yfill = pmax(Dep.x, Dep.y, na.rm = TRUE))
amar Par
  • 3
  • 3