0

Some months ago, I created a little method in my program to search for strings within a defined column in a 2D array (of String types) which works perfectly but when it comes to strings containing numbers or dot-separated numbers, it fails very badly.

    private void gather_matches()
    {
        SearchFor = null;
        SearchFor = tb_text.Text.ToLower();
        Int32 Column = cb_main.SelectedIndex;
        Int32 Counter = 0;
        for (Int32 i = 0; i < DYL; i++)
        {
            if (Data[i, XUniprotID] == null) break;
            else
            {
                if (Data[i, Column] == null) continue;
                if (Data[i, Column].ToLower().Contains(SearchFor))
                {

                    for (Int32 j = 0; j < DXL; j++)
                    {
                        Found[Counter, j] = Data[i, j];

                    }
                    Counter++;
                }
            }
        }

Very simple code but it works except for those columns (yes I checked if the Index is still correct). That's the input: enter image description here

When searching for "3" in Cath Class column, it spits out 3, 2, 1 and empty cells. When searching for "30" in Cath Architecture, it spits out everything that contains a 3 and a 0. When searching for 3.40 in Cath Architecture, it spits out that it found nothing.

What might be the problem? Haven't seen anything in the internet about that method having struggles with length or special characters.

Edith1 says:

How that data was created:

    private void cut_cath()
    {
        for (Int32 i = 0; i < DYL; i++)
        {
            if (Data[i, XUniprotID] == null) break;
            try
            {
                Datapath = startupPath + "\\cath+" + Data[i, XUniprotID] + ".txt";
                using (StreamReader Read = new StreamReader(Datapath))
                {
                    String Reader = Read.ReadToEnd();
                    String[] Parts = Regex.Split(Reader, "\t");
                    Data[i, Xcath] = Parts[0];
                    String[] CathParts = Parts[0].Split('.');
                    Data[i, XcaCl] = CathParts[0];
                    Data[i, XcaArch] = CathParts[0]+"."+CathParts[1];
                    Data[i, XcaTopo] = CathParts[0]+"."+CathParts[1]+"."+CathParts[2];
                    Data[i, XcaHomo] = Parts[0];
                    Data[i, XcaDom] = Parts[1];
                    Read.Close();
                }
            }
            catch
            {
                continue;
            }
        }

    }

Edit2:

Output when searching for "3.40" in Cath Architecture Column: enter image description here

As you can see, it's mostly correct but some aren't matching and still there.

Edit3:

Added Code:

     public bool Kontainser(String Value, String Input) //yeah, I know, stupid name...
     {
         return Input.IndexOf(Value, StringComparison.OrdinalIgnoreCase) >= 0;
     }

[...]

                if (Data[i, Column] == null) continue;

                if (Kontainser(SearchFor, Data[i, Column]))
                {

                    for (Int32 j = 0; j < DXL; j++)
                    {
                        Found[Counter, j] = Data[i, j];

                    }
                    Counter++;
                }

Now it works perfectly for half of the search and then decides to ignore the IF. The search was "3.40.50" in the CathTopology column.

Output: enter image description here

All that drama just in these CATH and Genome3D columns... nowhere else.

MeepMania
  • 103
  • 2
  • 13
  • 4
    You should prepare small piece of code which reproduces your problem: *Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself*. It would make answering the question easier. If you think `string.Contains` causes the problem how you get the values does not matter. Can you post just `string.Contains` call with proper input and incorrect output? – MarcinJuraszek Dec 30 '13 at 20:02
  • 2
    what makes you think "3.40" is the "value". Perhaps that is the display format and NOT the raw value of 3.4 – T McKeown Dec 30 '13 at 20:03
  • 1
    Don't store numbers as strings. You're setting yourself up for great pains. Store numbers as numbers. – Servy Dec 30 '13 at 20:05
  • It is really difficult to know the problem without seeing actual input values; but as a blind shot I suggest you to trim the input variables (at least in the comparisons). For example `Data[i, Column].ToLower().Trim().Contains(SearchFor.Trim())` seems safer than your version. – varocarbas Dec 30 '13 at 20:05
  • @Servy It looks like the "numbers" really are strings, not values, as they progress a, a.b, a.b.c, a.b.c.d. – Andrew Morton Dec 30 '13 at 20:13
  • What is the mysterious `DXL` variable? – Andrew Morton Dec 30 '13 at 20:15
  • DXL is the X-Axis of that array. – MeepMania Dec 30 '13 at 20:26
  • AND I SOLVED IT... can't believe it was that simple... String Helper = Data[i, Column].ToLower(); if (Helper.Contains(SearchFor)) I added just one line of code out of the blue. It seems thatToLower() and Contains() had little conflict. oO Although it still does some strange stuff when meeting with not exactly matching queries... – MeepMania Dec 30 '13 at 20:29
  • I think you need to set a breakpoint and step through the code. `Contains` works fine with "numbers" and decimal points - you need to confirm what values you are getting along the way, and find the point of failure. My guess is that Data[] returns a value that you are not expecting (we have no way of determining your data types without you providing them, but perhaps they are in fact returning floating point numbers that aren't exactly 3.40 or whatever). – Wonko the Sane Dec 30 '13 at 20:33
  • And before I forget: the table up there is input. It's the array (Data[,])displayed as table. – MeepMania Dec 30 '13 at 20:33
  • Again, put in some breakpoints and see *why* the IF condition is not met (I bet it isn't just that it "decides to ignore the IF"). – Wonko the Sane Dec 30 '13 at 21:30
  • Y-Axis of that array is 1103 long. That are at least 500 loops to get to that point. – MeepMania Dec 30 '13 at 21:45

1 Answers1

0

AND I SOLVED IT... can't believe it was that simple... String Helper = Data[i, Column].ToLower(); if (Helper.Contains(SearchFor)) I added just one line of code out of the blue. It seems thatToLower() and Contains() had little conflict. oO Although it still does some strange stuff when meeting with not exactly matching queries...

In the future you can do a case insensitive bar.Contains(foo) by doing

if(bar.IndexOf(foo, StringComparison.OrdinalIgnoreCase) >= 0)
{

}

Internally the code for Contains(string value) is (code retrieved from the reference source)

[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
[__DynamicallyInvokable]
public bool Contains(string value)
{
  return this.IndexOf(value, StringComparison.Ordinal) >= 0;
}

So the performance should fairly close to just using Contains itself, and will likely be much better than using ToLower().

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • Have to take the "I solved it" back and change it to "It works better now but not totally" Looked like it was working, at first. – MeepMania Dec 30 '13 at 20:39
  • Try using the `IndexOf` + `OrdinalIgnoreCase` instead of `Contains` and see if that solves it. – Scott Chamberlain Dec 30 '13 at 20:39
  • It works nicely but now the function produces some really weird stuff... I'll put it in the edit. – MeepMania Dec 30 '13 at 20:55
  • Regarding `StringComparison.OrdinalIgnoreCase` one might also consider `StringComparison.InvariantCultureIgnoreCase` if the text may contain diacritical marks which can be made in more than one way [like so](http://stackoverflow.com/a/3679927/119477). Although probably not the case here. – Conrad Frix Dec 30 '13 at 22:16