This returns a character vector. Read the regex as breaking it into three capture-groups with the parens: the first is any count of consecutive digits, followed by any number of non-digits, followed by 5 digits. Return only the first and the third with a space in-between (if there is a match) and make no change if no match;
> gsub("([0-9]*)(\\D*)(\\d{5})", "\\1 \\3", test)
[1] "1234 90001" "9876 94501"
It would need further parsing to return a set of numeric vectors
> scan( text=gsub("([0-9]*)(\\D*)(\\d{5})", "\\1 \\3", test), what=list("", "") )
Read 2 records
[[1]]
[1] "1234" "9876"
[[2]]
[1] "90001" "94501"
Probably better to read in zips as character (because you will want to preserve leading zeros), but could convert the street numbers to numeric by changing the what
list types:
> scan( text=gsub("([0-9]*)(\\D*)(\\d{5})", "\\1 \\3", test), what=list( numeric(), "") )
Read 2 records
[[1]]
[1] 1234 9876
[[2]]
[1] "90001" "94501"
To make this more useful:
> setNames( data.frame( scan( text=gsub("([0-9]*)(\\D*)(\\d{5})", "\\1 \\3", test),
what=list( numeric(), "") ) ,
stringsAsFactors=FALSE),
c( "StrtNumber", "ZIP") )
Read 2 records
StrtNumber ZIP
1 1234 90001
2 9876 94501