-1

I have a dataset below I need to remove all whitespace between the three text columns and replace with a single comma. I tried a few options with gsub but nothing worked. I'd like to do this in R

gsub("^ *|(?<= ) | *$", ",", all_data, perl=T)

Sample below all spacing is different sizes in the file (the number is just a row number)

> [1] Pig                            Piggy             2     
> [2] Chicken                        Chick            7     
> [3] Cow                               Calf     3

Desired output:

Pig,Piggy,2

Chicken,Chick,7

Cow,Calf,3

Thanks in advance.

SabDeM
  • 7,050
  • 2
  • 25
  • 38
SB77
  • 87
  • 2
  • 8

3 Answers3

9
gsub("\\s+", ",", gsub("^\\s+|\\s+$", "",x))
[1] "Pig,Piggy,2"     "Chicken,Chick,7" "Cow,Calf,3"

Any trailing or leading spaces will be eliminated without adding extraneous commas.

Sometimes odd character strings like this show up when the data is read into R with default settings. By using one of the many features of ?read.table you may be able to avoid issues ahead of time. One in particular is strip.white. When set to TRUE it will eliminate the extra spaces. Then it would be an easier operation to separate with commas.

Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • and might i recommend [this](http://www.regular-expressions.info/rlanguage.html) intro to `regex` – MichaelChirico Aug 24 '15 at 15:12
  • The last space run will give a ending comma to the string. The OP's input has an trailing spaces following the last number, 2 for the fast case for example. – Frash Aug 24 '15 at 15:14
  • @Frash I was thinking the same thing, but it's not clear to me if those are real trailing spaces or just an artifact of how R chooses to print results. – Frank Aug 24 '15 at 15:17
  • 1
    @Frank Based on the OP's code ` *$` looks like he has trailing spaces – akrun Aug 24 '15 at 15:18
3

As my previous comment (thank to Akrun for suggestions):

gsub("[[:blank:]]+", ",", x)
[1] "Pig,Piggy,2"     "Chicken,Chick,7" "Cow,Calf,3"    

data:

c("Pig                            Piggy             2", "Chicken                        Chick            7", 
"Cow                               Calf     3")
SabDeM
  • 7,050
  • 2
  • 25
  • 38
  • This looks the same as Pierre's answer. I suppose one could also write the regex as `" +"` and a few other ways... – Frank Aug 24 '15 at 15:14
  • 1
    @Frank my solution is with POSIX Character Classes, Pierre's one is with standard regex's "Character shorthands", just another point of view of the same thing I guess. – SabDeM Aug 24 '15 at 15:21
2

I would probably take the long way around instead of using pure regex:

sapply(strsplit(s," +"), paste0, collapse=",")
# [1] "Pig,Piggy,2"     "Chicken,Chick,7" "Cow,Calf,3" 

Or, as Pierre mentioned, read the data in correctly from the get-go.


Data:

s = c("Pig                            Piggy             2     ",
      "Chicken                        Chick            7     ",
      "Cow                               Calf     3")
Frank
  • 66,179
  • 8
  • 96
  • 180