0

Consider a text file stored in an online location that looks like this:

    ;aiu;

[MyEditor45]
Name = MyEditor 4.5
URL = http://www.myeditor.com/download/myeditor.msi
Size = 3023788
Description = This is the latest version of MyEditor
Feature = Support for other file types 
Feature1 = Support for different encodings
BugFix = Fix bug with file open
BugFix1 = Fix crash when opening large files
BugFix2 = Fix bug with search in file feature
FilePath = %ProgramFiles%\MyEditor\MyEditor.exe
Version = 4.5

Which details information about a possible update to an application which a user could download. I want to load this into a stream reader, parse it and then build up a list of Features, BugFixes etc to display to the end user in a wpf list box.

I have the following piece of code that essentially gets my text file (first extracting its location from a local ini file and loads it into a streamReader. This at least works although I know that there is no error checking at present, I just want to establish the most efficient way to parse this first. One of these files is unlikely to ever exceed more than about 250 - 400 lines of text.

Dim UpdateUrl As String = GetUrl()
    Dim client As New WebClient()
    Using myStreamReader As New StreamReader(client.OpenRead($"{UpdateUrl}"))

        While Not myStreamReader.EndOfStream
            Dim line As String = myStreamReader.ReadLine
            If line.Contains("=") Then
                Dim p As String() = line.Split(New Char() {"="c})
                If p(0).Contains("BugFix") Then
                    MessageBox.Show($" {p(1)}")
                End If
            End If
        End While
    End Using

Specifically I'm looking To collate the information about Features, BugFixes and Enhancements. Whilst I could construct what would in effect be a rather messy if statement I feel sure that there must be a more efficient way to do this , possibly involving linq. I'd welcome any suggestions.

I have added the wpf tag on the off chance that someone reading this with more experience of displaying information in wpf listboxes than I have might just spot a way to effectively define the info I'm after in such a way that it could then be easily displayed in a wpf list box in three sections (Features, Enhancements and BugFixes).

Dom Sinclair
  • 2,458
  • 1
  • 30
  • 47
  • If the files will never exceed 400 lines, you shouldn't be obsessed with efficiency. Don't worry about optimization until you have observed actual performance problems at runtime. – 15ee8f99-57ff-4f92-890c-b56153 May 28 '16 at 16:07
  • @EdPlunkett Fair point, but it's always nice to try and learn new and ,sometimes, more efficient ways to do things. – Dom Sinclair May 28 '16 at 16:14
  • 1
    Just a remark: you should use `line.Split({"="c}, 2, StringSplitOptions.None)` as otherwise you would lose any data after an "=" in the Description etc. - e.g. you might have a line "Bugfix = solved error when x=1". See [String.Split Method (Char(), Int32, StringSplitOptions)](https://msdn.microsoft.com/en-us/library/ms131450%28v=vs.110%29.aspx). You may also want to consider the "Remarks" section there. – Andrew Morton May 28 '16 at 17:43
  • One little advice: use `Microsoft.VisualBasic.FileIO.TextFieldParser` to read the file and split the lines in fields. – Gert Arnold May 28 '16 at 18:12
  • @AndrewMorton That's a really good point. I write, or perhaps it would be more accurate to say that the installer that I use and into which I add the relevant lines for features etc writes the txt file I'm parsing and whilst I haven't had cause to use a second '=' yet there may come a time when I might and I had given no thought to that at all. Thank you. – Dom Sinclair May 28 '16 at 18:20
  • Possible duplicate of [Reading/writing an INI file](http://stackoverflow.com/questions/217902/reading-writing-an-ini-file) – Robert McKee May 28 '16 at 18:38
  • @RobertMcKee Not sure how you figure its a duplicate of the question to which you refer. I state quite clearly that I'm getting the url of the online text file from a local ini file (logical conclusion: I can already read an ini file). Yes I know that this file is to all intents and purposes like an ini file but its not quite. I'm really interested in the most efficient way to extract particular information. How to read it isn't the question, how to read it efficiently is. I know it's a short file, but lessons learnt here can be applied to bigger ones. – Dom Sinclair May 28 '16 at 19:37

3 Answers3

1

Dom, Here is an answer in C#. I will try to convert it to VB.Net momentarily. First, since the file is small, read all of it into a list of strings. Then select the strings that contain an "=" and parse them into data items that can be used. This code will return a set of data items that you can then display as you like. If you have LinqPad, you can test thecode below, or I have the code here: dotnetfiddle

Here is the VB.Net version: VB.Net dotnetfiddle

   Imports System
Imports System.Collections.Generic
Imports System.Linq

Public Class Program
    Public Sub Main()
        Dim fileContent As List(Of String) = GetFileContent()

        Dim dataItems = fileContent.Where(Function(c) c.Contains("=")).[Select](Function(c) GetDataItem(c))

        dataItems.Dump()
    End Sub


    Public Function GetFileContent() As List(Of String)
        Dim contentList As New List(Of String)()

        contentList.Add("sb.app; aiu;")
        contentList.Add("")
        contentList.Add("[MyEditor45]")
        contentList.Add("Name = MyEditor 4.5")
        contentList.Add("URL = http://www.myeditor.com/download/myeditor.msi")
        contentList.Add("Size = 3023788")
        contentList.Add("Description = This is the latest version of MyEditor")
        contentList.Add("Feature = Support for other file types")
        contentList.Add("Feature1 = Support for different encodings")
        contentList.Add("BugFix = Fix bug with file open")
        contentList.Add("BugFix1 = Fix crash when opening large files")
        contentList.Add("BugFix2 = Fix bug with search in file feature")
        contentList.Add("FilePath = % ProgramFiles %\MyEditor\MyEditor.exe")
        contentList.Add("Version = 4.5")

        Return contentList
    End Function

    Public Function GetDataItem(value As String) As DataItem
        Dim parts = value.Split("=", 2, StringSplitOptions.None)

        Dim dataItem = New DataItem()

        dataItem.DataType = parts(0).Trim()
        dataItem.Data = parts(1).Trim()

        Return dataItem
    End Function
End Class

Public Class DataItem
    Public DataType As String
    Public Data As String
End Class

Or, in C#:

void Main()
{
    List<string> fileContent = GetFileContent();

    var dataItems = fileContent.Where(c => c.Contains("="))
                               .Select(c => GetDataItem(c)); 


    dataItems.Dump();
}    

public List<string> GetFileContent()
{
    List<string> contentList = new List<string>();

    contentList.Add("sb.app; aiu;");
    contentList.Add("");
    contentList.Add("[MyEditor45]");
    contentList.Add("Name = MyEditor 4.5");
    contentList.Add("URL = http://www.myeditor.com/download/myeditor.msi");
    contentList.Add("Size = 3023788");
    contentList.Add("Description = This is the latest version of MyEditor");
    contentList.Add("Feature = Support for other file types");
    contentList.Add("Feature1 = Support for different encodings");
    contentList.Add("BugFix = Fix bug with file open");
    contentList.Add("BugFix1 = Fix crash when opening large files");
    contentList.Add("BugFix2 = Fix bug with search in file feature");
    contentList.Add("FilePath = % ProgramFiles %\\MyEditor\\MyEditor.exe");
    contentList.Add("Version = 4.5");

    return contentList;
}

public DataItem GetDataItem(string value)
{
    var parts = value.Split('=');

    var dataItem = new DataItem()
    {
        DataType = parts[0],
        Data = parts[1]
    };

    return dataItem;
}

public class DataItem
{
    public string DataType;
    public string Data;
}
  • As you are proposing an answer, you could usefully incorporate the information in my comment to the OP ;) And as you have a VB translation, it would be better to replace your C# code with it here. – Andrew Morton May 28 '16 at 17:56
  • @RudyTheHunter Thank you for the example using linq. I'm in the process of combing that with the very valid point that Andrew had made, and which I really hadn't even considered (at least that's one less bug to look for!). – Dom Sinclair May 28 '16 at 18:35
  • I added the comment on the split to the code in the answer and on the dotnetfiddle example. –  May 28 '16 at 18:54
1

The given answer only focuses on the first part, converting the data to a structure that can be shaped for display. But I think you main question is how to do the actual shaping.

I used a somewhat different way to collect the file data, using Microsoft.VisualBasic.FileIO.TextFieldParser because I think that makes coding just al little bit easier:

Iterator Function GetTwoItemLines(fileName As String, delimiter As String) _
        As IEnumerable(Of Tuple(Of String, String))
    Using tfp = New TextFieldParser(fileName)
        tfp.TextFieldType = FieldType.Delimited
        tfp.Delimiters = {delimiter}
        tfp.HasFieldsEnclosedInQuotes = False
        tfp.TrimWhiteSpace = False

        While Not tfp.EndOfData
            Dim arr = tfp.ReadFields()
            If arr.Length >= 2 Then
                Yield Tuple.Create(arr(0).Trim(), String.Join(delimiter, arr.Skip(1)).Trim())
            End If
        End While
    End Using
End Function

Effectively the same thing happens as in your code, but taking into account Andrew's keen caution about data loss: a line is split by = characters, but the second field of a line consists of all parts after the first part with the delimiter re-inserted: String.Join(delimiter, arr.Skip(1)).Trim().

You can use this function as follows:

Dim fileContent = GetTwoItemLines(file, "=")

For display, I think the best approach (most efficient in terms of lines of code) is to group the lines by their first items, removing the numeric part at the end:

Dim grouping = fileContent.GroupBy(Function(c) c.Item1.TrimEnd("0123456789".ToCharArray())) _
    .Where(Function(k) k.Key = "Feature" OrElse k.Key = "BugFix" OrElse k.Key = "Enhancement")

Here's a Linqpad dump (in which I took the liberty to change one item a bit to demonstrate the correct dealing with multiple = characters:

enter image description here

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
  • Thank you Gert. This is a really nice example of using link to extract the information, and should (when I can finally get my head around wpf list boxes) make displaying it to the end user relatively straight forward. – Dom Sinclair May 30 '16 at 10:53
0

You could do it with Regular Expressions:

Imports System.Text.RegularExpressions

Private Function InfoReader(ByVal sourceText As String) As List(Of Dictionary(Of String, String()))
    '1) make array of fragments for each product info
    Dim products = Regex.Split(sourceText, "(?=\[\s*\w+\s*])")
    '2) declare variables needed ahead
    Dim productProperties As Dictionary(Of String, String)
    Dim propertyNames As String()
    Dim productGroupedProperties As Dictionary(Of String, String())
    Dim result As New List(Of Dictionary(Of String, String()))
    '2) iterate along fragments
    For Each product In products
        '3) work only in significant fragments ([Product]...)
        If Regex.IsMatch(product, "\A\[\s*\w+\s*]") Then
            '4) make array of property lines and extract dictionary of property/description
            productProperties = Regex.Split(product, "(?=^\w+\s*=)", RegexOptions.Multiline).Where(
            Function(s) s.Contains("="c)
            ).ToDictionary(
            Function(s) Regex.Match(s, "^\w+(?=\s*=)").Value,
            Function(s) Regex.Match(s, "(?<==\s+).*(?=\s+)").Value)
            '5) extract distinct property names, ignoring numbered repetitions
            propertyNames = productProperties.Keys.Select(Function(s) s.TrimEnd("0123456789".ToCharArray)).Distinct.ToArray
            '6) make dictionary of distinctProperty/Array(Of String){description, description1, ...}
            productGroupedProperties = propertyNames.ToDictionary(
            Function(s) s,
            Function(s) productProperties.Where(
                Function(kvp) kvp.Key.StartsWith(s)
                ).Select(
                Function(kvp) kvp.Value).ToArray)
            '7) enlist dictionary to result
            result.Add(productGroupedProperties)
        End If
    Next
    Return result
End Function
VBobCat
  • 2,527
  • 4
  • 29
  • 56