How to remove white space from HTML text

Question

How do I remove spaces in my code? If I parse this HTML with Nokogiri:

<div class="address-thoroughfare mobile-inline-comma ng-binding">Kühlungsborner Straße
                    10
                    </div>

I get the following output:

            Kühlungsborner Straße
            10

which is not left-justified.

My code is:

address_street = page_detail.xpath('//div[@class="address-thoroughfare mobile-inline-comma ng-binding"]').text

try `strip` i.e `address_street = page_detail.xpath('//div[@class="address-thoroughfare mobile-inline-comma ng-binding"]').text.strip` — Amol Udage, May 19 '16 at 10:25
Your example HTML will not result in the output you're showing. Only `10` would be indented. Using `text` with `xpath`, similar to `search` returns concatenated text from the nodes returned by `xpath`'s NodeSet. Instead of using `text` with a method that returns a NodeSet, you should `map` each individual node's `text`, then `strip` those. — the Tin Man, May 20 '16 at 23:29
See https://stackoverflow.com/a/43594657/128421 for more information. — the Tin Man, Feb 14 '20 at 01:24

score 2 · Answer 1 · edited Feb 14 '20 at 01:25

2

Please try strip:

address_street = page_detail.xpath('//div[@class="address-thoroughfare mobile-inline-comma ng-binding"]').text.strip

edited Feb 14 '20 at 01:25

the Tin Man

158,662
42
215
303

answered May 20 '16 at 05:42

Amol Udage

2,917
19
27

score 1 · Accepted Answer · answered May 20 '16 at 23:39

Consider this:

require 'nokogiri'

doc = Nokogiri::HTML('<div class="address-thoroughfare mobile-inline-comma ng-binding">Kühlungsborner Straße
                    10
                    </div>')
doc.search('div').text
# => "Kühlungsborner Straße\n                    10\n                    "
puts doc.search('div').text

# >> Kühlungsborner Straße
# >>                     10
# >>

The given HTML doesn't replicate the problem you're having. It's really important to present valid input that duplicates the problem. Moving on....

Don't use xpath, css or search with text. You usually won't get what you expect:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <body>
    <div>
      <span>foo</span>
      <span>bar</span>
    </div>
  </body>
</html>
EOT

doc.search('span').class # => Nokogiri::XML::NodeSet
doc.search('span') # => [#<Nokogiri::XML::Element:0x3fdb6981bcd8 name="span" children=[#<Nokogiri::XML::Text:0x3fdb6981b5d0 "foo">]>, #<Nokogiri::XML::Element:0x3fdb6981aab8 name="span" children=[#<Nokogiri::XML::Text:0x3fdb6981a054 "bar">]>]


doc.search('span').text
# => "foobar"

Note that text returned the concatenated text of all nodes found.

Instead, walk the NodeSet and grab the individual node's text:

doc.search('span').map(&:text)
# => ["foo", "bar"]

How to remove white space from HTML text

2 Answers2