1

So here's the situation. I have data that I would like to clean-up.

Here are a few examples of the data contained in the columns of a dataframe:

"NA"

"Border Interception"

"Border Interception, Suspected country of origin: Peru, Interception location: California, USA" <-- Most of the data looks like this. Approximately 90%.

"Border Interception, Suspected country of origin: Puerto Rico"

"Border Interception, Suspected country of origin: Mexico, Interception location: Nogales NZ, Interception Number: APTAZ130741709003"

I've looked at using tidyr and the separate() function to separate according to the "," character into two columns but that's not quite what I'm looking for.

I'd like to create four new columns:

intercept_type with "Border Interception" if present.

suspected_origin with the state and country if present.

intercept_location with the state and country if present.

intercept_number with the interception number if present.

Any idea how to best accomplish this?

  • If you head(dput(your_data), n = 6), it will greatly simplify playing around with. Something of a head scratch on the data where country of origin is Puerto Rico. – Chris Nov 05 '21 at 01:26
  • If you examine your data, you'll see you generally want things after `:` and before a following `,` or `.`. The phrases such as 'Suspected country of origin:` or 'Interception location:' are 'fixed', so fixed = TRUE, but this [select between regex](https://stackoverflow.com/questions/23503448/extract-a-string-between-patterns-delimiters-in-r) will get you there if this doesn't get closed as a duplicate. – Chris Nov 05 '21 at 01:59
  • 'NA' likely means in interception in the interior, and suggests another table of interest. – Chris Nov 05 '21 at 02:03
  • Please provide enough code so others can better understand or reproduce the problem. – Community Nov 05 '21 at 09:24
  • Apologies. Here is a github repository with the code and the dataset. https://github.com/hominidae/IBIO-6000 – BryanVandenbrink Nov 05 '21 at 13:56
  • There are two columns with string data I would like to tidy up. The columns collection_note and notes both have information that I would like in their own columns. – BryanVandenbrink Nov 05 '21 at 13:57

0 Answers0