So here's the situation. I have data that I would like to clean-up.
Here are a few examples of the data contained in the columns of a dataframe:
"NA"
"Border Interception"
"Border Interception, Suspected country of origin: Peru, Interception location: California, USA" <-- Most of the data looks like this. Approximately 90%.
"Border Interception, Suspected country of origin: Puerto Rico"
"Border Interception, Suspected country of origin: Mexico, Interception location: Nogales NZ, Interception Number: APTAZ130741709003"
I've looked at using tidyr and the separate() function to separate according to the "," character into two columns but that's not quite what I'm looking for.
I'd like to create four new columns:
intercept_type with "Border Interception" if present.
suspected_origin with the state and country if present.
intercept_location with the state and country if present.
intercept_number with the interception number if present.
Any idea how to best accomplish this?