0

I'm using VBA in excel which finds if specific values are existing in files coded like XML files. The XML files are big (more than 100000 lines with few million characters) and to increase the speed I'm loading XML files to an array. Everything works for me when I use 100 XML files, however, if there are >200 XML files, the excel will consume too much RAM and the code fails with 'runtime error 7 out of memory'.

I'm thinking that I need just specific values from XML code which always starts with: =" and ends with: ", so if I remove everything else and keep only required text this will save a lot of consumed RAM.

For example, my XML file contains:

...
$<yiapcspvgdldm:Condition.ActionTypes>
<yiapcspvgdldm:ColorChange
    BrushStyle="H1"
    ColorChangeType="NormalColorChange"
    Color="#00FFFFFF"
    PropertyName="Foreground" />
<yiapcspvgdldm:Blinking
    PropertyName="Foreground" />
<yiapcspvgdldm:Set
    AttributeName="Visibility"
    AttributeType="System"
    To="{x:Static Visibility.Hidden}" />
</yiapcspvgdldm:Condition.ActionTypes>$
...

In this case I will need just:

H1
NormalColorChange
#00FFFFFF
Foreground
Foreground
Visibility
System
{x:Static Visibility.Hidden}

As I mentioned, each file contains >100000 lines, I tried to go through string's each line till EOF,but this takes ages... I tried SPLIT function, but this just splits the text and doesn't remove unwanted text. I tried to find my answer here, but without success. Any help will be much appreciated.

Here is my extracted SUB:

Dim GrapicFiles(), GrapicText() As String
Dim PrjtFolder as string

Sub LoadXML()
Dim i, GraphCount As Integer
Dim Path, FileName As String
Dim objFSO, objTF As Object
Dim strIn As Variant

PrjtFolder="C:\temp\"

If Worksheets("Work").FilterMode Then Worksheets("Work").ShowAllData
GraphCount = Application.WorksheetFunction.CountA(Worksheets("Work").Range("B:B")) - 1


For i = 1 To GraphCount
    DoEvents

    FileName = Worksheets("Work").Cells(i + 1, 2).Value
    Path = PrjtFolder & FileName & "\Main.xml"

        'Load files to array

        ReDim Preserve GrapicFiles(UBound(GrapicFiles) + 1)
        ReDim Preserve GrapicText(UBound(GrapicText) + 1)

            'Text Reading
            Set objFSO = CreateObject("Scripting.FileSystemObject")
            Set objTF = objFSO.OpenTextFile(Path, 1)
            strIn = objTF.readall
            objTF.Close
            Set objFSO = Nothing
            Set objTF = Nothing

    '>>>>>>>I will need something here to make my 'strIn' string smaller

        'saving to array
        GrapicFiles(i) = FileName
        GrapicText(i) = strIn
        Set strIn = Nothing

Next i

End Sub
Andy
  • 79
  • 1
  • 7
  • `Redim Preserve` is rather resource consuming operation. It allocates the memory of the new size, copies the original array, and then releases the memory taken by the original array. So it takes double memory. Why don't you `Redim` once before the loop? You know the count of the files. For semantics: if you search a file for specific values, you can use `InStr` function to decide. – AcsErno Jul 12 '19 at 10:38
  • @AcsErno Thanks for the suggestion, I tested, however, it doesn't improve too much the code. I have to load all files first and after to check using `InStr`. I was thinking before about this solution but is not the one which I need. Any other idea? Is there a method to go loop through my `strIn` variable and check each line against `="`? looping in the string line by line will be faster than reading line by line from the file. I tried to use `SPLIT` function and then use a `FOR EACH` but the size of an array using SPLIT becomes too big so it loops forever in the SPLIT function... – Andy Jul 12 '19 at 11:15
  • There is a good solution here by Argut (solution 3 using GET)(https://stackoverflow.com/questions/1376756/what-is-a-superfast-way-to-read-large-files-line-by-line-in-vba) but isn't specified how to manipulate string after getting it... – Andy Jul 12 '19 at 11:18

1 Answers1

1

This may help you parse the lines:

Dim GrapicText() As String
Dim sLine As String
Dim i As Long, iPos As Long

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTF = objFSO.OpenTextFile("C:\Users\acs.erno\Documents\Prog\Frm\x.xml", 1)
strIn = objTF.readall
objTF.Close
Set objFSO = Nothing
Set objTF = Nothing

GrapicText = Split(strIn, vbCrLf)    ' split to buffer
For i = LBound(GrapicText) To UBound(GrapicText)
    iPos = InStr(GrapicText(i), "=") 
    If iPos > 0 Then                 ' lines with "=" only
        sLine = Mid$(GrapicText(i), iPos + 2)
        iPos = InStrRev(sLine, """")    ' find terminal "
        If iPos > 1 Then sLine = Left$(sLine, iPos - 1)
        Debug.Print sLine
    End If
Next

And 1 more remark: Dim GrapicFiles(), GrapicText() As String declares GrapicFiles() as Variant. Write Dim GrapicFiles() As String, GrapicText() As String if you want it String

AcsErno
  • 1,597
  • 1
  • 7
  • 10
  • Thanks for the solution. This is realy what I was requested, but I think the idea to process each line is still bad... I should think a different source of comparison instead using XML files. Thanks for suggestion on 'variant' type of variable, done some improvement... – Andy Jul 17 '19 at 17:50
  • @Andy You might happen to find some downloadable XML parser but they do the same inside: search the line char by char for = sign and then separate the key and the value by the = sign. You could analyze how much time is required by the blocks by `debug.print time()` at the beginning and at the end of sub, and commenting out first the line processing section (leaving file open and readall), and then commenting out readall, too. My guess is that line processing section needs a lot less time than file open. – AcsErno Jul 17 '19 at 18:43