2

I have some strings that are spaced as I want but that have leading digits that I don't want. I want to replace each of these leading digits with an equal number of spaces so as to maintain the spacing. I can do this with the gsubfn package but am curious if there's a native R regex way to accomplish this task.

Can I accomplish the same result as below using only native R regex functions?

MWE:

library(gsubfn)

string <- c(
    "1    12  end line", 
    "10   3   end line", 
    "50   444 end line", 
    "100  54  end line", 
    "1000 5   end line"
)

gsubfn('^\\d+', function(x) gsub('\\d', ' ', x), string)

Desired Result:

[1] "     12  end line"
[2] "     3   end line"
[3] "     444 end line"
[4] "     54  end line"
[5] "     5   end line"
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 1
    An alternative approach to Wiktor's answer might be to just remove the leading stuff you don't want and then using sprintf to pad back in the leading whitespace. Now that there is a good answer posted I don't feel like writing that solution up but if speed is an issue you could try a few different ways and see which gives the best results. – Dason Oct 20 '17 at 13:47
  • 1
    @Dason yeah I thought about a similar multi step approach but the `'\\G'` was the type of single line solution I was hoping existed. – Tyler Rinker Oct 20 '17 at 14:02

1 Answers1

2

You want to replace each single digit at the start of the string with a space.

Use

> gsub("\\G\\d", " ", string, perl=TRUE)
[1] "     12  end line" 
[2] "     3   end line" 
[3] "     444 end line"
[4] "     54  end line" 
[5] "     5   end line"

See the online regex demo (a bit modified to work with a multiline string input).

The \G\d pattern matches the start of string or the end of the previous successful match (with \G) and then matches a digit that is replaced with a space.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563