My Problem is that I have a column of data (characters & numbers combined) and I want to check for each field if it partially matches with one of the fields in a column of a different df, and if it does write the corresponding value from the second df into a new column of the first variable.
So more practically:
df1 (2x60k)=
QST_ID QST_CODING
1 M79.68
2 A01.1
3 K07
4 Z09
df2 (2x2451)=
icd name
A01 intestinal diseases
...
K07 Diseases of oral cavity
And the aim would be to have df1 =
QST_ID QST_CODING QST_CODING_RC_NAME
1 M79.68 diseaseX
2 A01.1 Intestinal diseases
3 K07 Diseases of oral cavity
4 Z09 diseaseY
The easiest fix would be to ignore everything after the . and basically treat the 'icd' as icd* But I don't get this to work for the whole list.
I tried doing this with the match, pmatch and grep functions and tried prefixing all icd with ^ (regex wildcat) but it didn't work at all:
df1$QST_CODING_RC_name <- df2$name[pmatch(as.character(df1$QST_CODING), df2$icd, duplicates.ok = T)]
df1$QST_CODING_RC_name <- df2$name[grep(df2$icd, as.character(df1$QST_CODING), ignore.case = T)]
I would be very happy if anyone has a solution for this. Thanks a lot for your help.
Edit: QST_CODING contains values of a different form(g.523, etc.), where the information after the point is necessary to work with later and cannot simply be dropped. Furthermore, the length of bothe the part in front of the point as well as the part after it differs in QST_CODING.