Using a dataframe as a regex reference to match string values

Question

Thank you so much for your help in advance.

I have a field named "ERROR_COLAB" in which a series of responses are concatenated into a single long string, because of the nature of the ERRORS that can be present there is no a formal, objective, efficient way to "split" the values in "ERROR_COLAB" to classify the responses concatenated in them.

So I was thinking about what if I can create a dataframe with the values that I need to extract to later on "parse" them into a regex formula in order to extract them.. to illustrate my idea:

Lets say I have this datedrame

code_error	meaning
po_R83	No_call_bak
?OP	card_nofunds
HOTELARCH78	overbookings

and I have the following values in "ERROR_COLAB"

ERROR_COLAB
?OP_ERR7+JSU8.OIJK1
po_R83_io
IOS_NEVER:300SSSS
HOTELARCH78?123-

I would like to know if the first part of the string is equal to any of the values on the field "error code" of the dataframe containing the code and meanings . So my desired result would look like this:

ERROR_COLAB	code_error_matched	meaning
?OP_ERR7+JSU8.OIJK1	?OP	card_nofunds
po_R83_io	po_R83	No_call_bak
IOS_NEVER:300SSSS	N.A	N.A
HOTELARCH78?123-	HOTELARCH78	overbookings

Thank you so much guys! like trully!

data:

codes<-tribble(~code_error, ~meaning,
"po_R83",   "No_call_bak",
"?OP",  "card_nofunds",
"HOTELARCH78",  "overbookings")

errors<-tribble(~ERROR,
"?OP_ERR7+JSU8.OIJK1",
"po_R83_io",
"IOS_NEVER:300SSSS",
"HOTELARCH78?123-")

Refer to https://stackoverflow.com/questions/26405895/how-can-i-match-fuzzy-match-strings-from-two-datasets — Peace Wang, May 03 '21 at 18:54

score 2 · Accepted Answer · answered May 03 '21 at 19:13

A base R option using agrep + merge

merge(
  transform(
    codes,
    ERROR = sapply(code_error, function(x) agrep(x, errors$ERROR, value = TRUE))
  ),
  errors,
  all = TRUE
)

gives

                ERROR  code_error      meaning
1 ?OP_ERR7+JSU8.OIJK1         ?OP card_nofunds
2    HOTELARCH78?123- HOTELARCH78 overbookings
3   IOS_NEVER:300SSSS        <NA>         <NA>
4           po_R83_io      po_R83  No_call_bak

Using a dataframe as a regex reference to match string values

1 Answers1