My question is about whitespace in R. There have been many questions regarding whitespace in R, but I haven't found any about types of whitespace producing inconsistent behavior.
I scraped a table from Wikipedia and I was trying to separate a column with a whitespace (e.g., Minnesota 6
) into two columns (c(Minnesota, 6)
). I tried using tidyr's separate() function and gotten the maddening error message Expected 2 pieces. Missing pieces filled with NA in 364 rows ...
It seems that separate()
does not recognize the whitespace before the number as whitespace. Interestingly, it does recognize the whitespace when it's in a state name (e.g. South Dakota, New York).
Code that produces error:
reps %<>%
clean_names() %>%
separate(district, into = c('state', 'd'), sep = '\\s', remove = FALSE)
Nevertheless, when I run sum(str_detect(reps$District, '\\s'))
I get 435, which is the number of rows. So it is detecting whitespace before a number.
A further twist. When I export the dataframe to a .csv and then read it in, the problem with separate()
disappears. But still, I would like to know what this invisible problem is.
Here you can find the .rds and here the .csv, if you're into that kind of thing.