I have string text like
s=c('1 word word word 1,000,000,000 word 2018 word',
'2 word 2,000,000,000 word 2017',
"3 word1 word2 119,902,000 2017 word3.")
basically a string composed of 4 columns, but is space separated.
first column is an index number, 1:1m,
second column is 1-10 words, with a single white space between
3rd column is a comma separated large int with a potentially trailing word
4th column is a year followed by a possible word.
I want to create a dataframe, and need to separate the string into 4 parts.
I tried strsplit(x,' ')
but that gave me different splits, depending how many spaces were between the 2nd and 3rd 'columns.
I believe gsub should be used first. I would like to gsub all whitespace >=2 to a tab \t, but am not sure how to do it.
I found some suggestions Regex to remove white space between tags in gsub R but they did not fit in with my situatin.
I tried to substitute on white space >=2 text :
gsub('\\s{2}','\t','fefl jklj',perl = TRUE
and
gsub('\\s{2}+','\t','fefl jklj',perl = TRUE)
both to no avail
Any suggestions.