4

I have a question regarding removing white spaces within a character text inside a column data frame. This is my data frame column:

head(data$HO)
[1] "Lidar; Wind field; Temperature; Aerosol; Fabry-Perot etalon"                             
[2] "Compressive ghost imaging; Guided filter; Single-pixel imaging"    

This question differs from this one link as I want to remove only the spaces after the symbol ";" , so the output should look like this:

head(data$HO)
[1] "Lidar;Wind field;Temperature;Aerosol;Fabry-Perot etalon"                             
[2] "Compressive ghost imaging;Guided filter;Single-pixel imaging"    

I have tried

data$HO <- gsub("\\;s", ";",data$HO)

but it doesn't work.

Any suggestion?

Amleto
  • 584
  • 1
  • 7
  • 25
  • Possible duplicate of [How to trim leading and trailing whitespace in R?](https://stackoverflow.com/questions/2261079/how-to-trim-leading-and-trailing-whitespace-in-r) – pogibas Feb 05 '18 at 21:39
  • Very similar to https://stackoverflow.com/questions/41264545/regex-how-to-remove-blank-space-after-a-period-before-a-punctuation-character – thelatemail Feb 05 '18 at 22:41

2 Answers2

5

You may use ;\s+ pattern and replace with ;:

> x <- c("Lidar; Wind field; Temperature; Aerosol; Fabry-Perot etalon", "Compressive ghost imaging; Guided filter; Single-pixel imaging")
> gsub(";\\s+", ";", x)
[1] "Lidar;Wind field;Temperature;Aerosol;Fabry-Perot etalon"     
[2] "Compressive ghost imaging;Guided filter;Single-pixel imaging"

Pattern details:

  • ; - a semi-colon
  • \s+ - one or more whitespace chars.

See the regex demo.

Some more variations of the solution:

gsub("(*UCP);\\K\\s+", "", x, perl=TRUE)
gsub(";[[:space:]]+", ";", x)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

One more possible solution could be using look-behind ?<= token. Just check for ; behind \s+ and replace space with nothing.

v <- c("Lidar; Wind field; Temperature; Aerosol; Fabry-Perot etalon", 
      "Compressive ghost imaging; Guided filter; Single-pixel imaging")

gsub("(?<=;)\\s+", "", v, perl = TRUE)

# Result:
# [1] "Lidar;Wind field;Temperature;Aerosol;Fabry-Perot etalon"     
# [2] "Compressive ghost imaging;Guided filter;Single-pixel imaging"
MKR
  • 19,739
  • 4
  • 23
  • 33