Splitting strings in between 3rd and 4th characters in R

Question

I'm grabbing information from Wikipedia on Canadian Forward Sortation Areas (FSAs - those are the first 3 digits of postal codes in Canada) and what cities/areas they belong to. Example of this information is below:

library(rvest)
library(tidyverse)

URL <- paste0("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_", "K")

FSAs <- URL %>% 
  read_html() %>% 
  html_nodes(xpath = "//td") %>% 
  html_text()

head(FSAs)
[1] "K1AGovernment of CanadaOttawa and Gatineau offices (partly in QC)\n"            "K2AOttawa(Highland Park / McKellar Park /Westboro /Glabar Park /Carlingwood)\n"
[3] "K4AOttawa(Fallingbrook)\n"                                                      "K6AHawkesbury\n"                                                               
[5] "K7ASmiths Falls\n"                                                              "K8APembrokeCentral and northern subdivisions\n"

The problem I'm facing is that I would like to have a data frame with the first 3 digits of each spring in one column, and the rest of the information in another. I've thought there would be a solution involving a stringr function like str_split(), but this removes the pattern of the first 3 digits, which I of course don't want. In effect, I'm looking to split the string in-between the 3rd and 4th character of each string.

I've figured out this solution, with the last bit borrowed from this answer, but it's incredibly hackish. My question is, is there a better way of doing this?

FSAs %>% 
  enframe(name = NULL) %>%
  separate(value, c(NA, "Location"), sep = "^...", remove = FALSE) %>% 
  separate(value, c("FSA", NA), sep = "(?<=\\G...)")

# A tibble: 195 x 2
   FSA   Location                                                                     
   <chr> <chr>                                                                        
 1 K1A   "Government of CanadaOttawa and Gatineau offices (partly in QC)\n"           
 2 K2A   "Ottawa(Highland Park / McKellar Park /Westboro /Glabar Park /Carlingwood)\n"
 3 K4A   "Ottawa(Fallingbrook)\n"                                                     
 4 K6A   "Hawkesbury\n"                                                               
 5 K7A   "Smiths Falls\n"                                                             
 6 K8A   "PembrokeCentral and northern subdivisions\n"                                
 7 K9A   "Cobourg\n"                                                                  
 8 K1B   "Ottawa(Blackburn Hamlet / Pine View / Sheffield Glen)\n"                    
 9 K2B   "Ottawa(Britannia /Whitehaven / Bayshore / Pinecrest)\n"                     
10 K4B   "Ottawa(Navan)\n"

See the function: `substr()`, one can specify the start and stop positions. For example: `substr(x, 1, 3)` — Dave2e, Nov 14 '19 at 21:53
`data.frame(FSA = substring(FSAs, 1, 3), Location = substring(FSAs, 4))` works, thanks. — Phil, Nov 14 '19 at 21:57

Splitting strings in between 3rd and 4th characters in R

0 Answers0