4

How to remove whitespaces between letters NOT numbers

For example:

Input

I ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000

Output

IESP 010 000 000 000 000 000 001 001 000 000 IESP 000 000

I tried something like this

gsub("(?<=\\b\\w)\\s(?=\\w\\b)", "", x,perl=T)

But wasn't able to arrive at the output I was hoping for

pogibas
  • 27,303
  • 19
  • 84
  • 117
jonnyblue8
  • 109
  • 1
  • 8

2 Answers2

6

Use gsub to replace whitespace " " with nothing "" between letters then return replacement and letters.

Input <- "I ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000"
gsub("([A-Z]) ([A-Z])", "\\1\\2", Input)
[1] "IESP 010 000 000 000 000 000 001 001 000 000 IESP 000 000"

Edit after @Wiktor Stribiżew comment (replaced [A-z] to [a-zA-Z]):

For lower and upper case use [a-zA-Z]

Input <- "I ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000 aaa ZZZ"
gsub("([a-zA-Z]) ([a-zA-Z])", "\\1\\2", Input)
[1] "IESP 010 000 000 000 000 000 001 001 000 000 IESP 000 000 aaaZZZ"
pogibas
  • 27,303
  • 19
  • 84
  • 117
  • 1
    [Do not use `[A-z]` to only match letters](https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret/29771926#29771926). It matches more than just letters. Besides, the `gsub("([A-Z]) ([A-Z])", "\\1\\2", Input)` won't work for `I E P 1000` input. – Wiktor Stribiżew Oct 04 '17 at 11:21
  • See my answer for overlapping cases. – Wiktor Stribiżew Oct 04 '17 at 11:29
  • probably even better to use `[[:alpha:]]` rather than `[A-Za-z]` in case you happen to be in [an Estonian locale](https://stackoverflow.com/questions/6799872/how-to-make-grep-a-z-independent-of-locale) ... – Ben Bolker Oct 04 '17 at 11:29
3

You need to use

Input <- "I ES P E ES P 010 000 000 000 000 000 001 001 000 000 IESP 000 000"
gsub("(?<=[A-Z])\\s+(?=[A-Z])", "", Input, perl=TRUE, ignore.case = TRUE)
## gsub("(*UCP)(?<=\\p{L})\\s+(?=\\p{L})", "", Input, perl=TRUE) ## for Unicode

See the R demo online and a regex demo.

NOTE: The ignore.case = TRUE will make the pattern case insensitive, if it is not expected, remove this argument.

Details

  • (?<=[A-Z]) (or (?<=\p{L})) - a letter must appear immediately to the left of the current location (without adding it to the match)
  • \\s+ - 1 or more whitespaces
  • (?=[A-Z]) (or (?=\\p{L})) - a letter must appear immediately to the right of the current location (without adding it to the match).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563