Removing whitespace, tabs and new lines from array

Question

How do I go about removing the tabs, new lines, and whitespaces from this array?

array1 = ["E", "A", "C", "H", " ", "L", "I", "N", "E", " ", "E", "N", "D", "S", " ", "W", "I", "T", "H", " ", "A", " ", "A", "C", "C", "I", "D", "E", "N", "T", "A", "L", "L", "Y", " ", " ", "A", "D", "\"", "A", " ", "A", "C", "C", "I", "\n", "\""]

I have tried the following, and none of these seem to work properly.

array1.map!(&:strip)

array1.reject!(&:empty?)

array1.reject(&:empty?)

array1 - [""]

array1.delete_if {|x| x == " " }

Cary Swoveland · Accepted Answer · 2019-02-25T07:07:05.590

4

array1 = ["E", " ", ":", "L", "É", "\t", "T",
          "-", "H", "\n", "\""]

array1.reject { |s| s.match? /\s/ }
  #=> ["E", ":", "L", "É", "T", "-", "H", "\""]

\s in a regular expression matches all whitespace characters, namely, spaces, tabs ("\t") newlines ("\n"), carriage returns ("\r") and formfeeds ("\f").

The latter two have their origins from the days when teletype machines were used, the carriage return being the movement of the printhead from the end to the beginning of the line and the formfeeds advancing the paper being printed one line.¹

^{_{1 Microsoft Windows still recognizes carriage returns and formfeeds, thereby maintaining support for teletype machines. ¯\_(ツ)_/¯}}

edited Feb 25 '19 at 07:07

answered Feb 25 '19 at 03:12

Cary Swoveland

106,649
6
63
100

Perhaps you want to see his/her second question in the accepted answer's comment. – Sebastián Palma Feb 25 '19 at 03:24
@Sebastian, the example (which I modified slightly) should address that. – Cary Swoveland Feb 25 '19 at 03:29
1

`\p{Space}` would also get rid of other unicode whitespace :) – Kimmo Lehto Feb 25 '19 at 07:22
Alternatively if you like the [point-free/tacit](https://stackoverflow.com/questions/944446/what-is-point-free-style-in-functional-programming) approach: `array1.reject(&/\s/.method(:match?))` – 3limin4t0r Feb 25 '19 at 11:09
@Johan, I'm not particularly fond of that construct (which seems to be increasingly popular), but maybe that just reflects my lack of familiarity. – Cary Swoveland Feb 25 '19 at 17:26
The concept is pretty simple. You remove the variable assignment in between and instead tell `reject` to forward the block arguments to the method `match?` on `/\s/`. – 3limin4t0r Feb 25 '19 at 18:38
@Yes, I understand, just not sure it's worth it to avoid having a block. – Cary Swoveland Feb 25 '19 at 20:47

Stefan · Answer 2 · 2019-02-25T12:00:01.317

2

You can use grep to select elements matching a pattern. That pattern can be a simple regexp like /\s/ which matches whitespace characters:

array1.grep(/\s/)
#=> [" ", " ", " ", " ", " ", " ", " ", " ", "\n"]

The result is an array with all elements containing at least one whitespace character.

There's also \S (uppercase) which matches non-whitespace characters:

array1.grep(/\S/)
#=> ["E", "A", "C", "H", "L", "I", "N", "E", "E", "N", "D", "S", "W",
#    "I", "T", "H", "A", "A", "C", "C", "I", "D", "E", "N", "T", "A",
#    "L", "L", "Y", "A", "D", "\"", "A", "A", "C", "C", "I", "\""]

And we have grep_v which is the inverted version of grep. This would be useful if you wanted to specify space, tab and newline explicitly:

array1.grep_v(/[ \t\n]/)
#=> ["E", "A", "C", "H", "L", "I", "N", "E", "E", "N", "D", "S", "W",
#    "I", "T", "H", "A", "A", "C", "C", "I", "D", "E", "N", "T", "A",
#    "L", "L", "Y", "A", "D", "\"", "A", "A", "C", "C", "I", "\""]

edited Feb 25 '19 at 12:00

answered Feb 25 '19 at 09:55

Stefan

109,145
14
143
218

This is the most suitable answer imo. – 3limin4t0r Feb 25 '19 at 11:12
@Johan, I'll second that, in part because of the he reference to `grep_v`, which I've not seen used before. Ann, please consider moving the greenie to this answer. – Cary Swoveland Feb 25 '19 at 16:19
Readers: repeat 100 times, "To select with a regex, think grep. To reject, grep_v." – Cary Swoveland Feb 25 '19 at 17:22
^ The better mnemonic would be: Can I match elements using the case equality (`===`)? If the answer is *yes* use `grep`. Another example without regex could be: `[1, 'A', :b].grep(Integer) #=> [1]` or `(1..100).grep(95..150) #=> [95, 96, 97, 98, 99, 100]` – 3limin4t0r Feb 26 '19 at 11:18

score 0 · Answer 3 · answered Feb 25 '19 at 09:41

In addition, just other possible variants:

array1 = [" ", "A", "\n", "\t", "B", "\r"]
array1.delete_if { |s| s.match? /\s/ }
#=> ["A", "B"]

array1 = [" ", "A", "\n", "\t", "B", "\r"]
array1.keep_if { |s| !s.match? /\s/ }
#=> ["A", "B"]

array1 = [" ", "A", "\n", "\t", "B", "\r"]
array1.select! { |s| !s.match? /\s/ }
#=> ["A", "B"]

Using match? rather than match is more preferable not only because we don’t use MatchData.

The point is that the benchmark shows that match? is almost 2 times faster.

This can be significant when working with large amounts of data.

Removing whitespace, tabs and new lines from array

3 Answers3