2

Here is a brief bit of background information:

Whilst designing a Rule Engine to compare various aspects of a form to desired Inputs i stumbled across the problem of having to check if a number was larger than another number, easy enough with the > and < operators right? Wrong, the type of these two that i have to compare are strings, they have to be strings, its not something that can change.

So wishing to see where the Engine would fail so i would know where to begin i told it to compare "100" to "10,000", you can imagine my surprise when it properly calculated that 10k was larger, then i realised DUH its just comparing string lengths, so i compared 1001 to 1000, again it got it correct, 1001 is larger.

But i was adamant that this should NOT work, so i kept hitting the engine with all sorts of scenarios determined to watch it fail. After a colleague pointed out that the system was capable of comparing 1001 and 1000 as file names and properly ordering, the newest thought was that it compared aschii character values of some sort, testing continued. This WOULD fail, i couldn't accept that it was capable of properly calculating which was greater numerically when both were strings.

So my next thought was that it was lining up the first character of each and comparing values through each part of the string. I finally succeeded when i tested 11,111 to 9,999 and it deemed that 9,999 was greater. Perfect, i was near enough happy that it compared 1 to 9, and 9 won on each time, simple fix, prefix the shorter string with 0's.

Ran this new theory through the engine and once again it was happily calculating which was larger.

However I'm still not convinced, there must be other pitfalls to this type of comparison, but im running short of comparisons to test to prove my theory. So my question to you over-flowers is what scenarios do you think this could fail on?

Have you yourselves tried this before? comparing numbers when they are strings and what were the pitfalls you faced? have i covered them all or am i overlooking some major pitfalls?

I'm not convinced this method is fool proof (note however that i didn't test strings such as 100d to 10000 because there is validation to be sure of that)

Thanks in advance!

NOTE: i did do some googling and searching and i dont think it is covered by any question here, yes some are similar but they are concerned about not wanting numbers in the string, not wanting a string of only numbers, so i deemed this different enough to post.

Note 2: My specific question is where will numeric comparisons fail when using strings of numbers instead of ints

RhysW
  • 455
  • 1
  • 4
  • 16
  • 1
    Where will they fail as opposed to `int` comparison? Well, if it has been programming correctly, they won't. – Adam Houldsworth Jul 20 '12 at 13:35
  • I think you have over-rationalised this, when you should be parsing to int, double etc and comparing THAT. – Polyfun Jul 20 '12 at 13:39
  • Again, aware of the proper way of doing it, i even do it the proper way, im just curious as to the extent of using strings, pure curiosity – RhysW Jul 20 '12 at 13:41
  • http://stackoverflow.com/a/11052176/932418 – L.B Jul 20 '12 at 13:41
  • 1
    @RhysW - Seems like you wasted a LARGE amount of time trying to proof something you already knew. Why don't you just covert the strings that are numbers into numbers? If you already know the proper way to do this, I don't understand the purpose of this question, seems like a complete waste of time because it boils down to the fact you don't understand how strings are compared. – Security Hound Jul 20 '12 at 15:20
  • 1
    @ramhound i understand how strings are compared, i understand and implement the right way of doing it, why is everyone so against curiosity as to what might happen if i WERE to do it this way, thats all it is, curiosity into the weird and wonderful rather than robotically following what everyone else does without question... – RhysW Jul 20 '12 at 15:38
  • 1
    Besides, no reason to down vote because someone does not understand something. – Vitaliy Jul 21 '12 at 06:32

2 Answers2

4

To compare numbers that are strings, it'd be best to turn them into numbers first, e.g. using int.Parse. It's much more foolproof, not to mention easier, to let someone else figure out all the culture-sensitive complexities of parsing a string into a number, and just use simple number comparison after that.

If the string isn't always a number, use int.TryParse and handle appropriately.

Tim S.
  • 55,448
  • 7
  • 96
  • 122
  • Ofc im aware of that one :P but the field thats being tested wont always be numeric, sometiems it will be "Male" soemtimes "100" so i cant use int parse because trying to parse Male as an int would be rediculous :P – RhysW Jul 20 '12 at 13:36
  • Consider `int.TryParse` then. I've edited my answer. – Tim S. Jul 20 '12 at 13:37
  • 1
    @RhysW: between Male and Female, who is greater? Why? Design correctly your app/data before writing an engine. Saving numbers and enums as strings is not fair if you need to parse and compare them... – Marco Jul 20 '12 at 13:39
  • under normal scenarios it would be fine but im using workflows rule engine, obviously i can edit the rule to do that and it works fine, but my curiosty overtook me and i was curious as to other pitfalls with this way, as the less logic i have to use to decide whether i need to transform it the better, if comparing them as strings is going to work then im happy with it – RhysW Jul 20 '12 at 13:40
4

Let me see if I understand your question:

You are basically asking whether the following holds:

Given positive integers n1 and n2 and their corresponding lexical representations L(n1) and L(n2), n1 < n2 if and only if L(n1) < L(n2)?

If that is the question than yes, it is correct, you can derive it from the definition of lexicographic ordering.

See: http://www.dartmouth.edu/~matc/DiscreteMath/III.5.pdf Definition Definition III.5.2.

However, you did not mention that all your integers are positive but the exact opposite is correct for negative integers.

What worries me is the possibility for presence of non strings that do not represent integers. Can you somehow avoid it? If not, it is very dangerous way to go and can yield very unexpected behavior.

Vitaliy
  • 8,044
  • 7
  • 38
  • 66
  • thankyou, and yes there is validation in place to only allow numerical from those places, and in the cases where i cant be sure there are regex strings looking for patterns and some snipping off of the useless bits to keep just the number, i would just be using ints but sometimes the returned value is a string and must be a string so its a bit annoying – RhysW Jul 20 '12 at 15:35
  • That is interesting. And sounds like a code smell. But can't be sure. Can you elaborate a bit about the use case? Sometimes there might be external constraints that prevent this kind of conversion (from a performance perspective, for example). I am not saying this is the case but you shouldn't judge so quickly. – Vitaliy Jul 21 '12 at 06:30