1

I wonder if anyone can help me. I have a bunch of text files that contain a few thousand lines, and I just want to extract one element of each file.

A snippet of the contents of the files is like so:

                    <LastMassUpdateChange xsi:nil="true" />
                    <Notes />
                    <PropertyType1>House</PropertyType1>
                    <PropertyType2>SemiDetached</PropertyType2>
                    <PositionOfFlat xsi:nil="true" />
                    <FlatWhichFloor>0</FlatWhichFloor>
                    <FlatFloorsAbove>0</FlatFloorsAbove>

Where I just want to extract the text between <PropertyType2> & </PropertyType2> So in this case SemiDetached and place this result next to the file url column.

The urls of the files will all be in a column within excel, so I need a loop vba to check each text file within that column, and put the result in the next column.

I had the following code to extract the data within a certain line, but I didn't realise the files were not all formatted with the same amount of lines so it hasn't worked out.

Any help greatly appreciated, thanks.

Sub extractpropertytype()
Dim d As Integer
' For d = 1 To Sheet2.Range("G" & Rows.Count).End(xlUp).Row
 For d = 2 To Range("AE1").Value + 1
 
 'Workbooks("Book1").Activate
 Open Range("AA" & d).Value For Input Access Read As #1
 For i = 1 To 80

 Line Input #1, X
 'Range("a1").Offset(i - 1, 0).Value = x
 Next i
 Line Input #1, X
 Range("AB" & d) = X
 Close #1
 
 Next d
End Sub
BigBen
  • 46,229
  • 7
  • 24
  • 40
rinse04
  • 47
  • 3
  • `If X Like "*" Then` – Tim Williams Feb 02 '21 at 22:27
  • Thanks Tim, where do I go from there? Mix if with my current code? Thanks – rinse04 Feb 02 '21 at 23:27
  • `If X Like "*" Then Range("AB" & d) = X` would be a good start. – Tim Williams Feb 02 '21 at 23:33
  • Hi Tim I've tried the following, but I can't seem to work out how to tell the code to search all the rows in the file? ```Sub extracthousetype2() Dim d As Integer For d = 2 To Range("AE1").Value + 1 Open Range("AA" & d).Value For Input Access Read As #1 For i = 1 To 80 Line Input #1, X Next i Line Input #1, X If X Like "*" Then Range("AB" & d) = X Close #1 Next d End Sub``` Thanks – rinse04 Feb 03 '21 at 08:19
  • If you need to post updated code it's best to edit your question and add it there: anything more than one line of code in comments isn't really readable – Tim Williams Feb 03 '21 at 16:16

1 Answers1

2

This reads all the lines into a string using a file system object and a regular expression to extract the value between the tags.

Option Explicit

Sub extractpropertytype()

    Dim wb As Workbook, ws As Worksheet
    Dim iRow As Long, iLastRow As Long
    Dim sXML As String, sFilename As String, sPath As String
   
    Dim Regex As Object, Match As Object
    Set Regex = CreateObject("vbscript.regexp")

    ' capture text between tags
    With Regex
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "<PropertyType2>(.*)</PropertyType2>"
    End With
   
    ' file system object to read text
    Dim oFSO As Object, oFile As Object
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    
    Set wb = ThisWorkbook
    Set ws = Sheet2 ' change to suit
    sPath = wb.Path & "\"
   
    ' scan list of text files on turn
    iLastRow = ws.Range("AE1").Value + 1
    For iRow = 2 To iLastRow
      
       ' open file and read all lines
       sFilename = sPath & ws.Cells(iRow, "AA")
       Set oFile = oFSO.OpenTextFile(sFilename, 1)
       sXML = oFile.ReadAll
      
       ' extract value with regex
       If Regex.test(sXML) Then
           Set Match = Regex.Execute(sXML)
           ws.Cells(iRow, "AB") = Match(0).submatches(0)
       Else
           ws.Cells(iRow, "AB") = "No match"
       End If
      
       oFile.Close
      
    Next iRow
    MsgBox iLastRow - 1 & " files scanned", vbInformation

End Sub
CDP1802
  • 13,871
  • 2
  • 7
  • 17