-2

Have a CSV file which has a column which has a variable list of items separated by a |.

I use the code below:

violations <- inspections %>% head(100) %>% 
  select(`Inspection ID`,Violations) %>% 
  separate_rows(Violations,sep = "|")

but this only creates a new row for each character in the field (including spaces)

What am I missing here on how to separate this column?

Abdessabour Mtk
  • 3,895
  • 2
  • 14
  • 21
dfaberjob
  • 41
  • 6
  • Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Aug 24 '20 at 00:18
  • Possible duplicate https://stackoverflow.com/questions/27721008/how-do-i-deal-with-special-characters-like-in-my-regex – Ronak Shah Aug 24 '20 at 00:18
  • it's probably due to the fact that separate rows uses regex try `separate_rows(Violations,sep = '\\|')` – Abdessabour Mtk Aug 24 '20 at 00:21
  • Appreciate the feedback on how my question can be better and will look to increase the information presented in any future ones. – dfaberjob Aug 24 '20 at 00:24
  • It seems like the '\\|" suggestion solved my problem. Appreciate the suggestion Abdessabour – dfaberjob Aug 24 '20 at 00:24

3 Answers3

4

It's hard to help without a better description of your data and an example of what the correct output would look like. That said, I think part of your confusion is due to the documentation in separate_rows. A similar function, separate, documents its sep argument as:

If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

but the documentation for the sep argument in separate_rows doesn't say the same thing though I think it has the same behavior. In regular expressions, | has special meaning so it must be escaped as \\|.

df <- tibble(
  Inspection_ID = c(1, 2, 3),
  Violations = c("A", "A|B", "A|B|C"))
separate_rows(df, Violations, sep = "\\|")

Yields

# A tibble: 6 x 2
  Inspection_ID Violations
          <dbl> <chr>     
1             1 A         
2             2 A         
3             2 B         
4             3 A         
5             3 B         
6             3 C      
amoeba
  • 4,015
  • 3
  • 21
  • 14
0

Not sure what your data looks like, but you may want to replace sep = "|" with sep = "\\|". Good luck!

BellmanEqn
  • 791
  • 3
  • 11
0

Using sep=‘\|’ with the separate_rows function allowed me to separate pipe delimited values

dfaberjob
  • 41
  • 6