0

How do I remove spaces in my code? If I parse this HTML with Nokogiri:

<div class="address-thoroughfare mobile-inline-comma ng-binding">Kühlungsborner Straße
                    10
                    </div>

I get the following output:

            Kühlungsborner Straße
            10

which is not left-justified.

My code is:

address_street = page_detail.xpath('//div[@class="address-thoroughfare mobile-inline-comma ng-binding"]').text
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
matt-rock
  • 133
  • 2
  • 14
  • 1
    try `strip` i.e `address_street = page_detail.xpath('//div[@class="address-thoroughfare mobile-inline-comma ng-binding"]').text.strip` – Amol Udage May 19 '16 at 10:25
  • Is this working ?? – Amol Udage May 19 '16 at 10:30
  • thanks this works fine – matt-rock May 19 '16 at 18:41
  • Your example HTML will not result in the output you're showing. Only `10` would be indented. Using `text` with `xpath`, similar to `search` returns concatenated text from the nodes returned by `xpath`'s NodeSet. Instead of using `text` with a method that returns a NodeSet, you should `map` each individual node's `text`, then `strip` those. – the Tin Man May 20 '16 at 23:29
  • See https://stackoverflow.com/a/43594657/128421 for more information. – the Tin Man Feb 14 '20 at 01:24

2 Answers2

2

Please try strip:

address_street = page_detail.xpath('//div[@class="address-thoroughfare mobile-inline-comma ng-binding"]').text.strip
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Amol Udage
  • 2,917
  • 19
  • 27
1

Consider this:

require 'nokogiri'

doc = Nokogiri::HTML('<div class="address-thoroughfare mobile-inline-comma ng-binding">Kühlungsborner Straße
                    10
                    </div>')
doc.search('div').text
# => "Kühlungsborner Straße\n                    10\n                    "
puts doc.search('div').text

# >> Kühlungsborner Straße
# >>                     10
# >>                     

The given HTML doesn't replicate the problem you're having. It's really important to present valid input that duplicates the problem. Moving on....

Don't use xpath, css or search with text. You usually won't get what you expect:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <body>
    <div>
      <span>foo</span>
      <span>bar</span>
    </div>
  </body>
</html>
EOT

doc.search('span').class # => Nokogiri::XML::NodeSet
doc.search('span') # => [#<Nokogiri::XML::Element:0x3fdb6981bcd8 name="span" children=[#<Nokogiri::XML::Text:0x3fdb6981b5d0 "foo">]>, #<Nokogiri::XML::Element:0x3fdb6981aab8 name="span" children=[#<Nokogiri::XML::Text:0x3fdb6981a054 "bar">]>]


doc.search('span').text
# => "foobar"

Note that text returned the concatenated text of all nodes found.

Instead, walk the NodeSet and grab the individual node's text:

doc.search('span').map(&:text)
# => ["foo", "bar"]
the Tin Man
  • 158,662
  • 42
  • 215
  • 303