0

Here's my code so far:

public void DeserialStream(string filePath)
    {
        using (StreamReader sr = new StreamReader(filePath))
        {
            string currentline;
            while ((currentline = sr.ReadLine()) != null)
            {
                if (currentline.IndexOf("Count", StringComparison.CurrentCultureIgnoreCase) >= 0)
                {
                    Console.WriteLine(currentline);
                }

            }
        }
    }

I was wondering how can I grab comma delimited values that appear after a term I searched for?

Like if I a csv that contained this info:

"Date","dd/mm/yyyy"
"ExpirationDate","dd/mm/yyyy"

"DataType","Count"
"Location","Unknown","Variable1","Variable2","Variable3"
"A(Loc3, Loc4)","Unknown","5656","787","42"
"A(Loc5, Loc6)","Unknown","25","878","921"

"DataType","Net"
"Location","Unknown","Variable1","Variable2","Variable3"
"A(Loc3, Loc4)","Unknown","5656","787","42"
"A(Loc5, Loc6)","Unknown","25","878","921"

But how would I grab the table of values after Count but before Net?

That is, only the data is brackets is what I want to parse:

"Date","dd/mm/yyyy"
    "ExpirationDate","dd/mm/yyyy"

    "DataType","Count"
   [ "Location","Unknown","Variable1","Variable2","Variable3"
    "A(Loc3, Loc4)","Unknown","5656","787","42"
    "A(Loc5, Loc6)","Unknown","25","878","921"]

    "DataType","Net"
    "Location","Unknown","Variable1","Variable2","Variable3"
    "A(Loc3, Loc4)","Unknown","5656","787","42"
    "A(Loc5, Loc6)","Unknown","25","878","921"

I was thinking maybe I should use a regular expression or is there an easier way using the method above?

Kala J
  • 2,040
  • 4
  • 45
  • 85
  • You need to provide more information on how the CSV is formatted. If you want to use a regex you need to provide all possible patterns the regex has to find. Now, how do you want to capture the values in the comma delimited list? would a simple string[] produced from `Split()` work? – Farhad Alizadeh Noori May 20 '14 at 15:28
  • How would I customize Split() such that only the text that appears after Count is grabbed but before another similar field of text, in the example above before DateType, Net? – Kala J May 20 '14 at 15:30

2 Answers2

2

You can use a regex like this:

\"DataType\"\,\"(?:Count|Net)\"((?!\"DataType\").)*

This would match the DataType line all the way to the next DataType line.

Farhad Alizadeh Noori
  • 2,276
  • 17
  • 22
  • That would work, the problem is I have a csv file that contains several DataTypes.... DataType A, B, C, D, E,.... and I just want info from DataType C to D. In the above example, I want to specify the Regex to include Count and Net since those are the only unique identifiers – Kala J May 20 '14 at 15:48
  • Btw, I have another question, if I had the same CSV format above but with a 21X23 grid of data, would the regular expression change? – Kala J May 20 '14 at 16:24
  • What's important with regular expressions is the format of the text. It doesn't matter if grid changes as long as the pattern the regex matches doesn't change. If it does I suggest you create a new question. – Farhad Alizadeh Noori May 20 '14 at 17:11
  • Btw, I have a small question, if I want to use the regular expression above, I would write something like this right: var countRegex = new Regex("\"DataType\"\,\"(?:Count|Net)\"((?!\"DataType\").)*"); However, in my Visual Studio, it's telling me that the comma is unrecognized escape sequence? How do I fix that? – Kala J May 20 '14 at 17:33
  • try putting a @ before your string: `var countRegex = new Regex(@"\"DataType\"\,\"(?:Count|Net)\"((?!\"DataType\").)*");` – Farhad Alizadeh Noori May 20 '14 at 18:02
  • Actually, it think it just had to do with not escaping the comma, so I removed \ from the comma and it works. However, when I parse my csv file, it only gives me "DataType", "Count" and "DataType", "Net" and not the information/table in-between them. Not sure why. – Kala J May 20 '14 at 18:05
  • Use the `RegexOptions.Singleline` option. This would make "." match newline as well. – Farhad Alizadeh Noori May 20 '14 at 18:10
  • I will post a new question. Thanks! – Kala J May 20 '14 at 18:20
  • http://stackoverflow.com/questions/23767254/regex-returns-terms-and-not-information-in-between-terms – Kala J May 20 '14 at 18:25
1

You can use LINQ:

List<string> lines = File.ReadLines(path)
   .SkipWhile(l => l.IndexOf("\"Count\"", StringComparison.InvariantCultureIgnoreCase) == -1)
   .Skip(1) // skip the "Count"-line
   .TakeWhile(l => l.IndexOf("\"Net\"",   StringComparison.InvariantCultureIgnoreCase) == -1)
   .ToList();

Use String.Split to get a string[] for every line. In general i would use an available CSV parser which handle edge cases and bad data instead of reinventing the wheel.

Edit: If you want to split the fields into a List<string> you should use a CSV parser as mentioned since your data already uses a quoting character, so commas wrapped in " should not be splitted.

However, here is another simple but efficient approach using a StringBuilder:

public static IEnumerable<string> SplitCSV(string csvString)
{
    var sb = new StringBuilder();
    bool quoted = false;

    foreach (char c in csvString)
    {
        if (quoted)
        {
            if (c == '"')
                quoted = false;
            else
                sb.Append(c);
        }
        else
        {
            if (c == '"')
            {
                quoted = true;
            }
            else if (c == ',')
            {
                yield return sb.ToString();
                sb.Length = 0;
            }
            else
            {
                sb.Append(c);
            }
        }
    }

    if (quoted)
        throw new ArgumentException("csvString", "Unterminated quotation mark.");

    yield return sb.ToString();
}

( thanks to https://stackoverflow.com/a/4150727/284240 )

Now you can use SelectMany in the query above to flatten out all tokens:

List<string> allTokens = File.ReadLines(path)
    .SkipWhile(l => l.IndexOf("\"Count\"", StringComparison.InvariantCultureIgnoreCase) == -1)
    .Skip(1) // skip the "Count"-line
    .TakeWhile(l => l.IndexOf("\"Net\"", StringComparison.InvariantCultureIgnoreCase) == -1)
    .SelectMany(l => SplitCSV(l.Trim()))
    .ToList();

Result:

Location, Unknown, Variable1, Variable2, Variable3, A(Loc3, Loc4), Unknown, 5656, 787, 42, A(Loc5, Loc6), Unknown, 25, 878, 921, ""
Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Hmm I think I might have to use split... since in this case, I am also getting System.Collections.Generic.List`1[System.String] and a recursive script? it loops frequently (not sure why?) – Kala J May 21 '14 at 12:35
  • @KalaJ: what is your desired result at all? _"grab the table of values after Count but before Net"_ is not really helpful. – Tim Schmelter May 21 '14 at 12:40
  • Sorry, I fixed the recursive problem. I just want to make sure I'm adding the values of "Location","Unknown","Variable1","Variable2","Variable3" "A(Loc3, Loc4)","Unknown","5656","787","42" "A(Loc5, Loc6)","Unknown","25","878","921" into a list. Eventually this will go into one of my db tables. – Kala J May 21 '14 at 12:45
  • @KalaJ: what are the "values" you want, a `List` which contains entries like `Location` or `Unknown` or even a `Dictionary` where the key is `Location` and the value is `A(Loc3, Loc4)` (for example) ? – Tim Schmelter May 21 '14 at 14:30
  • @KalaJ: i've edited my answer to show the list approach. – Tim Schmelter May 21 '14 at 14:46