3

I've got a problem which seems to be a real nut. I am using HTMLAgilityPack in order to read in an HTML page and use XPath to select a couple of elements I need. This works fine.

Using XPATH, I'm also trying to select the number that is this DIV (441676).

<div class="info">
       Money:
       441 676,-<br>        
</div>

I manage to select the number, and trim it using this fantastic method: Fastest way to remove white spaces in string

But whatever I do, the white space between the 441 and 676 won't disappear. Trimming white spaces other places works just fine. It is ONLY between the digits that it doesn't work. Anyone knows what I'm missing here?

wp78de
  • 18,207
  • 7
  • 43
  • 71
Rupal
  • 482
  • 4
  • 8
  • 14
  • What is the charcode of this space? There is more than just one character to create a spacing, the usual space is 0x20. – Philip Daubmeier Jun 19 '12 at 12:56
  • Perhaps it isn't the "usual" whitespace character, [there are many](http://en.wikipedia.org/wiki/Whitespace_character). – Adam Houldsworth Jun 19 '12 at 12:56
  • 1
    Why don't you try this in the final stage: ("441 676").Replace(" ", ""); – Rumplin Jun 19 '12 at 12:57
  • @Rumplin It wouldn't find the whitespace, otherwise the linked method would work. – Adam Houldsworth Jun 19 '12 at 12:57
  • I think it should work and has nothing to do with numbers change. – Asif Mushtaq Jun 19 '12 at 12:58
  • @Asif Fair enough, I took his code sample literally, but appended to the string without hardcoding the numbers would be fine. That said, if this solution would work, so would the linked solution already in use - meaning this question wouldn't exist. – Adam Houldsworth Jun 19 '12 at 12:59
  • @Rumplin: I've tried that, it doesn't work. – Rupal Jun 19 '12 at 13:03
  • @PhilipDaubmeier How can I find "which" white space character it is? – Rupal Jun 19 '12 at 13:04
  • 1
    @Rupal: e.g. just do a ``yourstring.ToCharArray().Select(x=>(byte)x).ToArray()`` to get a byte array from the string and look at it in the debugger. If it is a usual space (hex: 0x20) it should say 32 (0x20 in decimal) at the respective position. – Philip Daubmeier Jun 19 '12 at 13:06

2 Answers2

3

It looks to me like you are dealing with a non-breaking space. Using the method you linked to, I have two suggestions for you.

The first is to update your toExclude array to include the following character:

var str = s.ExceptChars(new[] { ' ', '\t', '\n', '\r','\u00A0'});

Note: You should probably move the array to a static global variable, since it never changes and you don't want to be reallocating it every time you call this function.

Another alternative would be to update your ExceptChars function to use the Char.IsWhiteSpace function, as follows:

public static string ExceptChars(this string str, IEnumerable<char> toExclude) 
{ 
    StringBuilder sb = new StringBuilder(); 
    for (int i = 0; i < str.Length; i++) 
    { 
        char c = str[i]; 
        if (!Char.IsWhiteSpace(c))
            sb.Append(c); 
    } 
    return sb.ToString(); 
} 
Jon Senchyna
  • 7,867
  • 2
  • 26
  • 46
  • I use your solution instead. Including the 'u00A0' fixed the problem WITHOUT the need of creating a new method as I did. Thanks! – Rupal Jun 19 '12 at 13:12
1

Allright, I solved it this way. Using the ExceptChars method in Fastest way to remove white spaces in string I modified it to "AllowChars" method which only keeps the given characters. Like this:

public static string AllowedChars(string str, IEnumerable<char> toInclude)
{
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < str.Length; i++)
        {
            char c = str[i];
            if (toInclude.Contains(c))
                sb.Append(c);
        }
        return sb.ToString();
    }

Use the method like this then:

string money_fixed =  AllowedChars(money, new HashSet<char>(new[] {'1','2', '3', '4', '5', '6', '7', '8', '9', '0' }));
Community
  • 1
  • 1
Rupal
  • 482
  • 4
  • 8
  • 14