-1

I have a problem which is giving me a headache. I really thought someone would have asked this already, but days of reading and testing has been fruitless.

I have a text file which starts:

"Determining profile based on KDBG search...

     Suggested Profile(s) : WinXPSP2x86, WinXPSP3x86 (Instantiated with WinXPSP2x86)"

(The blank line between the two is not an error and neither are the spaces before 'Suggested')

I need to read the line starting 'Suggested...' only and extract every unique word starting 'Win' and populate a combobox with them. (i.e. 'WinXPSP2x86' and 'WinXPSP3x86')

I know i need to use the 'StreamReader' class and probably get a Regex going on, but, as a beginner, connecting it all together is beyond my knowledge at the moment.

Can anyone help? It would be much appreciated.

trickyb_uk
  • 27
  • 1
  • 7
  • No need for Regex - simply split on space and check each resulting string to see if it starts with "Win". – Tim Jan 26 '16 at 18:42
  • If the file is not very large (say, less than 4MB, as a fairly arbitrary size) then you can read it all into an array in one go by using [File.ReadAllLines](https://msdn.microsoft.com/en-us/library/s2tte0y1%28v=vs.110%29.aspx). That gets you started with having the file data in your program. – Andrew Morton Jan 26 '16 at 18:42

2 Answers2

0

As some have already suggested:

  • Use System.IO.File.ReadAllLines, if the file is not too big
  • Iterate through the array of lines
  • For each line, use the Split method to split on space
  • Check the first three characters of each word

This works but does of course need some error checking etc:

        Dim lines() As String = System.IO.File.ReadAllLines("c:\temp\example.txt")

        Dim lineWords() As String
        For Each line As String In lines
            lineWords = line.Split(New Char() {" "}, System.StringSplitOptions.RemoveEmptyEntries)

            For Each word As String In lineWords
                If word.Length > 3 Then
                    If word.Substring(0, 3).ToUpper = "WIN" Then
                        cmbWords.Items.Add(word)
                    End If
                End If                
            Next
        Next
LinusN
  • 51
  • 3
  • One more requirement from OP's post: "extract every **unique** word starting 'Win'" – Tim Jan 26 '16 at 19:02
  • @LinusN Thanks for the swift responses all... That's nearly exactly what i need. I can ignore the 'unique' issue - it's not critical, but i do need to remove superfluous characters - i.e. the comma after the first word containing 'Win' is dragged along and the closing parenthesis after the third... is there a simple way to identify the end of a word if it will always be a alphanumeric character...? Thanks again – trickyb_uk Jan 26 '16 at 19:37
  • Well, add the words to a List(of String) instead of the ComboBox Items collection, and before adding a new word, check if it already exists in the collection (Check out the List.Find function). Then assign the list to the DataSource property of the ComboBox. – LinusN Jan 26 '16 at 21:34
  • Regarding filtering out non-alphanumeric characters, check this out (http://stackoverflow.com/questions/3210393/how-do-i-remove-all-non-alphanumeric-characters-from-a-string-except-dash) – LinusN Jan 26 '16 at 21:36
0
Imports System.IO
Public Class Form1

Private Sub Form1_Load( sender As Object,  e As EventArgs) Handles MyBase.Load

    ' BASIC is case sensitive and e is parameter so we will start
    ' new variables with the letter f.

    ' Read all lines of file into string array F.
    Dim F As String() = File.ReadAllLines("H:\Projects\35021241\Input.txt")
    ' F() is a 0 based array.  Assign 3 line of file to G.
    Dim G As String = F(2)
    ' On line 3 of file find starting position of the word 'win' and assign to H.
    ' TODO:  If it is not found H will be -1 and we should quit.
    Dim H As Integer = G.IndexOf("Win")
    ' Assign everything beginning at 'win' on line 3 to variable I.
    Dim I As String = G.Substring(H)
    ' The value placed in delimiter will separate remaining values in I.
    ' Place C after ending quote to represent a single character as opposed to a string.
    Dim Delimiter As Char = ","C
    ' J array will contain values left in line 3.
    Dim J As String() = I.Split(Delimiter)

    ' Loop through J array removing anything in parenthesis.
    For L = J.GetLowerBound(0) to J.GetUpperBound(0)
        ' Get location of open parenthesis.
        Dim ParenBegin As Integer = J(L).IndexOf("(")
        ' If no open parenthesis found continue.
        If ParenBegin <> -1 then
            ' Open parenthesis found.  Find closing parenthesis location
            ' starting relative to first parenthesis.
            Dim Temp As String = J(L).Substring(ParenBegin+1)
            ' Get location of ending parenthesis.
            Dim ParenEnd As Integer = Temp.IndexOf(")")
            ' TODO:  Likely an exception will be thrown if no ending parenthesis.
            J(L) = J(L).Substring(0,ParenBegin) & J(L).Substring(ParenBegin + ParenEnd +2)
            ' Change to include text up to open parenthesis and after closing parenthesis.
        End If
    Next L

    ' UnwantedChars contains a list of characters that will be removed.
    Dim UnwantedChars As String = ",()"""
    ' Check each value in J() for presence of each unwanted character.
    For K As Integer = 0 to (UnwantedChars.Length-1)
        For L = J.GetLowerBound(0) To J.GetUpperBound(0)
            ' Declare M here so scope will be valid at loop statement.
            Dim M As Integer = 0
            Do
                ' Assign M the location of the unwanted character or -1 if not found.
                M= J(L).IndexOf(UnwantedChars.Substring(K,1))
                ' Was this unwanted character found in this value?
                If M<>-1 Then
                    ' Yes - where was it found in the value?
                    Select Case M
                        Case 0  ' Beginning of value
                            J(L) = J(L).Substring(1)
                        Case J(L).Length    ' End of value.
                            J(L) = J(L).Substring(0,(M-1))
                        Case Else   ' Somewhere in-between.
                            J(L) = J(L).Substring(0,M) & J(L).Substring(M+1)
                    End Select
                Else
                    ' No the unwanted character was not found in this value.
                End If
            Loop Until M=-1 ' Go see if there are more of this unwanted character in the value.
        Next L  ' Next value.
    Next K  ' Next unwanted character.

    ' Loop through all the values and trip spaces from beginning and end of each.
    For L As Integer = J.GetLowerBound(0) To J.GetUpperBound(0)
        J(L) = J(L).Trim
    Next L  

    ' Assign the J array to the combobox.
    ComboBox1.DataSource = J

End Sub

End Class
William M
  • 16
  • 2
  • That's close and definitely moves me along. As with the suggestion above, the first 'Win..' word includes the comma at the end and, oddly, the third entry into the combobox is '(Instantiated with WinXPSP2x86)', rather than just the 'WinXP...'. Thanks for the cooperation, though – trickyb_uk Jan 26 '16 at 19:52
  • What values are you expecting in the combobox? "WinXPSP2x86" and "WinXPSP3x86 (Instantiated with WinXPSP2x86" or something else? – William M Jan 26 '16 at 20:44
  • William, That's definitely got rid of the unwanted commas and parenthesis - i'll need to study it to figure out how! It is still, however, adding the words 'Instantiated' and 'with' to the combobox but i can't see why! – trickyb_uk Jan 26 '16 at 20:44
  • Sorry, we added comments at the same time... All i want in the combobox, ideally, from this text doc example, would be WinXPSP2x86 and WinXPSP3x86, however if the 'unique' aspect of the code is problematic, then the second instance of WinXPSP2x86 could also be listed – trickyb_uk Jan 26 '16 at 20:46
  • Using the code above I am getting 2 values in the combo box. They are "WinXPSP2x86" and " WinXPSP3x86 Instantiated with WinXPSP2x86". If this works I will comment or explain, but first we have to get something that works. :-) – William M Jan 26 '16 at 20:53
  • I reread your comment. I can remove anything in parenthesis including the parenthesis and trim the result. Is that what this needs to convert "WinXPSP3x86 (Instantiated with WinXPSP2x86)" into just "WinXPSP3x86"? – William M Jan 26 '16 at 20:56
  • William, yes, that's what i'm seeing here as well... now to get rid of 'Instantiated with WinXPSP2x86'! – trickyb_uk Jan 26 '16 at 20:56
  • I thought you had it there... There's a space in front of WinXPSP3x86. Can that be removed? Then it'll be perfection. And thanks again. – trickyb_uk Jan 26 '16 at 21:16
  • Sorted the space problem by adding a 'space' to the unwanted characters. That looks great, now. Thanks again. – trickyb_uk Jan 26 '16 at 21:23
  • Added a for...next to loop through each array element and truncate spaces. – William M Jan 26 '16 at 21:23
  • Great work. That really moves my little project along. If you ever get time to explain the code, I'm sure I'll learn a lot. It seems incredible that something that sound so easy can look so complicated! Thanks. Richard – trickyb_uk Jan 26 '16 at 21:27
  • Richard, you are welcome! I added comments to help you learn. – William M Jan 27 '16 at 13:03
  • William, those annotations are a great help. Much clearer albeit still way beyond me at the mall moment. Thanks again. – trickyb_uk Jan 27 '16 at 19:52