0

I have string text like

    s=c('1         word word word      1,000,000,000 word        2018 word',
'2     word         2,000,000,000 word      2017',
"3       word1 word2                                              119,902,000        2017 word3.")

basically a string composed of 4 columns, but is space separated.

first column is an index number, 1:1m,

second column is 1-10 words, with a single white space between

3rd column is a comma separated large int with a potentially trailing word

4th column is a year followed by a possible word.

I want to create a dataframe, and need to separate the string into 4 parts.

I tried strsplit(x,' ') but that gave me different splits, depending how many spaces were between the 2nd and 3rd 'columns.

I believe gsub should be used first. I would like to gsub all whitespace >=2 to a tab \t, but am not sure how to do it.

I found some suggestions Regex to remove white space between tags in gsub R but they did not fit in with my situatin.

I tried to substitute on white space >=2 text :

gsub('\\s{2}','\t','fefl    jklj',perl = TRUE

and

 gsub('\\s{2}+','\t','fefl    jklj',perl = TRUE)

both to no avail

Any suggestions.

frank
  • 3,036
  • 7
  • 33
  • 65

0 Answers0