-3

I have 2 data frames. First has more number of rows and one ID column "ALP23456" and other related columns. Second has lesser number of rows and the ID value is present as a comment "ALP23456 done on 26th March". This is a free text and no pattern decipherable.

Problem: I want to reference ID column from Data Frame 1 into Data Frame 2 Text column to get some information from Data frame 2. Facing issue as it is not an exact match.

enter image description here

enter image description here

Solution I want: enter image description here

glennsl
  • 28,186
  • 12
  • 57
  • 75
Koyeli
  • 67
  • 1
  • 9

1 Answers1

0

I have used regular expression and merging of the two dataframes as shown below:

library(stringr)
library(dplyr)

df2$ID <- str_trim(str_extract(df2$Text, pattern = "Q\\S*|A\\S*"))
df <- left_join(df1, df2, by = "ID")
Lunalo John
  • 325
  • 3
  • 10
  • Can you update how to extract : 4 characters(can be from a-z) "_" 9 numeric values. example - NQRT_876789567 (actual format of ID) – Koyeli Mar 20 '19 at 09:31
  • replace the pattern with `"[A-Z]{4}_[0-9]{9}"` as shown below: `df2$ID <- str_trim(str_extract(df2$Text, pattern = "[A-Z]{4}_[0-9]{9}"))` – Lunalo John Mar 20 '19 at 09:40
  • Can I put an OR operator as well in pattern? If for example there are 2 formats like ACB_765876567 and ACBH_765456789 – Koyeli Mar 20 '19 at 09:45
  • Yes, It is possible to use or operator. It is similar to r syntax in r `|` – Lunalo John Mar 20 '19 at 09:48
  • In this case you can specify the minimum and maximum since only alpha part is changing :`df2$ID <- str_trim(str_extract(df2$Text, pattern = "[A-Z]{3, 4}_[0-9]{9}"))` – Lunalo John Mar 20 '19 at 09:49