2

I'm trying to use the squish method to reduce multiple white spaces in a string to single white spaces. However, I have a string with mutliple spaces which are not reduced. When I check for string[space_position].blank? it returns true, but its neither empty, nor does is it == ' '.

What could cause this behavior?

Not sure if this is relevant, but the string comes from a mongoDB and was saved there by Locomotive CMS.

Cornflex
  • 639
  • 5
  • 15
  • Can you add 1) string before method call 2) args you are calling method with 3) what you would expect? This would make the question a bit clearer. – Puhlze May 18 '13 at 16:07
  • @Puhlze 1) The string is something like this: `string = "sth: Sth"`. 2) Which method do you mean? I'm simply calling string.squish, no arguments. 3) I would expect squish to output `"sth: Sth"`, and `string[6] == ' '` to be true. – Cornflex May 18 '13 at 16:20
  • In the first code example of my above comment there are actually 3 spaces between the colon and the S. – Cornflex May 18 '13 at 16:28
  • provide output of `string.chars.map(&:ord)`. – tokland May 18 '13 at 16:29
  • @tokland: the three spaces: [32,160,32]. By the way, the spaces are created by `Sanitize.clean(string).to_json[1..-2]`. – Cornflex May 18 '13 at 16:33
  • what's the string you have and what output you want? example please. – Arup Rakshit May 18 '13 at 17:58

2 Answers2

2

the three spaces: [32,160,32]

ASCII 160 is a non breaking space usually found in HTML, and apparently not recognized as squish as a space. Try to replace it before:

string.gsub(160.chr, ' ').squish
tokland
  • 66,169
  • 13
  • 144
  • 170
  • Very clever, thank you. However I'm getting an encoding error (`Encoding::CompatibilityError: incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)`). I did add `# encoding: utf-8` at the top of the file. Any ideas? – Cornflex May 18 '13 at 16:51
  • it works for me in a console (1.9.3). Try with `"\xA0`" or check http://stackoverflow.com/questions/1942148/ruby-1-9-regular-expressions-with-unknown-input-encoding – tokland May 18 '13 at 16:57
  • Sorry, I've been off work for a while. Unfortunately, the issue still persists. I don't understand how the link you posted is supposed to help? `string.count('\xA0')` returns 13 in my example. However, when I call `string.gsub!('\xA0')` nil is returned, meaning that nothing is replaced. Weird. – Cornflex May 31 '13 at 16:17
  • the link is supposed to help because you have ASCII-160 chars in your string, so it's good to know why those chars stand for. I don't understand that gsub! command you run, without the second argument. Ýou said your string was [32, 160, 32], right? in a console: [32, 160, 32].map(&:chr).join.gsub(160.chr, '').squish => "" – tokland May 31 '13 at 18:42
  • Sorry, the method I actually called was `string.gsub!('\xA0','')`, with second argument. Yes, what you write works. However, it doesn't with my string. I think the error I pasted above means, that my string is UTF-8 but the regexp is ASCII!? I tried `string = string.encode('UTF-8')` and called gsub again, nothing different. Calling `string.gsub("\xA0".encode('UTF-8'),' ')` leaves me with this error: `RegexpError: invalid multibyte character: /\xA0/` – Cornflex May 31 '13 at 19:22
  • I found the solution. The unicode for non-breaking spaces is u00a0. Thus I had to gsub /\u00a0/. Thanks for your help! http://stackoverflow.com/questions/2588942/convert-non-breaking-spaces-to-spaces-in-ruby – Cornflex May 31 '13 at 19:40
  • Actually solved this with `string.gsub(/[[:space:]]/,' ').squish` – Cornflex May 31 '13 at 19:52
  • it worked with POSIX-style? and failed with `string.gsub(/\s+/, ' ')`? – tokland May 31 '13 at 20:11
0
string.squish! 

This might modify the string itself. Also, any empty string like " ".blank? will return true.

deepthi
  • 685
  • 3
  • 9