1

I need from a text file with the 3 million lines of port data in Excel, every 1 million lines portation to start a new list. What's wrong with the code, tell me?

    Sub ReadFile()
  Dim I, J
  I = 1
  J = 1
  File = "\\bla.bla.bla\Desktop\vsr.txt"
  Open File For Input As #1
  Do While Not EOF(1)
    Line Input #1, MyString
    Sheets("LIST" & J).Cells(I, 1) = MyString
    If I = 1000000 Then
      I = 1
      J = J + 1
    Else
      I = I + 1
    End If
  Loop
  Close #1
End Sub
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Cepreu
  • 13
  • 5
  • *Welcome to [so]!* Before your next question (or if my answer below doesn't solve your issue) please check out the [tour] (you'll earn your first badge!) as well as '[ask]' and **how to create a [mcve]**. Also [here](https://codeblog.jonskeet.uk/stack-overflow-question-checklist/) is a handy checklist from the site's top user. – ashleedawg Aug 22 '18 at 07:11
  • The question could have included more information (see links above).. but **+1 anyway** for generating some conversation and multiple answers with varying methods to solve an issue like this. This is my favourite part of [so] since I learn from others, while I'm helping. :-) – ashleedawg Aug 22 '18 at 09:39

3 Answers3

2

Perhaps something like this would be better. I couldn't test it since you didn't include same data and I don't know what port data is, but it should work.

If there's a problem with it (or anytime you have a code problem) try using F8 to step through the code to find out exactly what's happening on each line. (More on that from Chip Pearson here.)

Option Explicit

Sub ReadFile()
    Const fName = "\\bla.bla.bla\Desktop\vsr.txt"
    Dim rowNum As Long, shtNum As Long, myString As String
    
    Open fName For Input As #1
    shtNum = 1
    Do While Not EOF(1)
        rowNum = rowNum + 1
        Line Input #1, myString
        Sheets("LIST" & shtNum).Cells(rowNum, 1) = myString
        If rowNum = 1000000 Then
            shtNum = shtNum + 1
            rowNum = 0
        End If
    Loop
    Close #1
End Sub

Some notes:

  • Use 'Option Explicit at the top of [every] module [always] to help locate variables that are not properly declared or handled. (ie., MyString was undeclared)
  • Personally, I try to avoid using common words like File for variables as it may or may not be a "reserved word" (ie., a constant or function name built-in to Excel)
  • Personally, I usually make the first letter of my variable lowercase, to help differentiate my variables from built-in keywords, at a glance.
  • Avoid using meaningless variables like I and J (especially when posting code for help from others). A couple extra characters make the code a lot easier to understand and to modify/debug later.
  • When declaring a variable, specify a data type whenever possible.
  • You didn't mention anything about creating worksheets, and your code doesn't attempt it, so this code assumes all the required output worksheets exist and are blank.
  • An alternative to using shtNum (or J in your example) would be to use Int(rowNum/1000000)+1 to get the sheet number. This formula returns 1 for 0 to 999999, 2 for 1000000 to 1999999, etc.
Community
  • 1
  • 1
ashleedawg
  • 20,365
  • 9
  • 72
  • 105
  • honestly for something like this, with only 3 million lines of code (3 worksheets), if it's not something that needs to be done frequently and automatically, I'd probably just open the text file in Excel manually (or paste in the data) and do the 6 click manually to move part of the set to different sheets.... but it's good coding practice. :-) – ashleedawg Aug 22 '18 at 07:13
  • apologise for my ignorance, but I thought that the maximum load for Excel was ~1,400,000 rows? As a result, you could not load 3m rows. I have a .bat workaround that splits the txt file, that works quite efficiently. – Lowpar Aug 22 '18 at 07:44
  • 1
    @Lowpar - ah you're correct, I hadn't considered the [maximum](https://support.office.com/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3) of 1,048,576 rows in Excel 2016; in that case if I was doing it manually I would paste into Notepad++ and go from there -- but only if it was an occasionally required task. Nothing wrong with coding and as I said, a simple procedure like this is an excellent way to begin learning the basics of VBA.... – ashleedawg Aug 22 '18 at 09:08
  • Also I'm confused, @Lowpar, are you the same person that asked the question, or do you both happen to be splitting 3million-row text files? – ashleedawg Aug 22 '18 at 09:10
  • ashleedawg, I did not ask the question. I just have experience of doing this in Excel, or at least finding solutions that circumvent the file not loaded completely in Excel. – Lowpar Aug 22 '18 at 09:15
  • @ashleedawg,Thank u. I could not attach txt, because it contains personal data. The code works, thanks pointed out my mistakes, I'm just starting to learn VBA so as the automation of great help in the work. – Cepreu Aug 22 '18 at 13:37
1

@ashleedawg already wrote most of the important stuff, hover ever I would make another correction to the code.

The way you handle the data now means you look up line in the vsr.txt file, then copy it to a string and finally write it into a cell. This takes up a lot of processing power and will be quite slow (especially the part where you access the cells. It would be way more efficient to copy all the text into a array and then paste it all at once.

The WorksheetFunction.Transpose() is necessary since arrays in excel use Columns as the first dimensions.
Inserting the Code every 10'000 Lines prevents the Transpose function from running into Issues. I assume since it is part of the WorksheetFunction Code it may relay on some older code or even be specifically made this way to run lagacy code (lenght of the return value must be array of lenght Integer or something).
Using myWb is there to make the Range modifiers fully qualified, not really necessary but can save you a lot of trouble down the line.
With GROUPSIZE you can adjust the size of the Array-Block you want to import at once, a bigger number may give you faster processing speeds.

Option Explicit

Sub ReadFile()

    Const GROUPSIZE As Long = 10000 'ENTRIESPERPAGE must be divisible by this without rest to fit exactly 
    Const ENTRIESPERPAGE As Long = 1000000 
    Const fName As String = "\\bla.bla.bla\Desktop\vsr.txt"

    Dim rowNum As Long
    Dim rowOffset As Long
    Dim shtNum As Long
    Dim myString(1 To GROUPSIZE) As String
    Dim myWB As Workbook
    Dim Range2Fill As String

    Set myWB = ThisWorkbook

    Open fName For Input As #1
    shtNum = 1
    rowNum = 0
    rowOffset = 0
    Do While Not EOF(1)
        rowNum = rowNum + 1
        Line Input #1, myString(rowNum)

        If rowNum = GROUPSIZE Then
            Range2Fill = "A" & rowOffset + 1 & ":A" & rowOffset + rowNum
            myWB.Worksheets(shtNum).Range(Range2Fill) = WorksheetFunction.Transpose(myString)
            Debug.Print "Sheet: " & shtNum, "Offset: " & rowOffset

            If rowOffset >= ENTRIESPERPAGE - rowNum Then
                rowOffset = 0
                shtNum = shtNum + 1
            Else
                rowOffset = rowOffset + GROUPSIZE
            End If

            rowNum = 0
        End If
    Loop

    'writes the last set of data in case there are not an exact multiple of 1M values
    Range2Fill = "A" & rowOffset + 1 & ":A" & rowOffset + rowNum
    myWB.Worksheets(shtNum).Range(Range2Fill) = WorksheetFunction.Transpose(myString)
    Close #1

End Sub

Additional notes:

  • Excel supports up to 1,048,576 Rows per worksheet, but expect performance to drop drastically on so many datapoints, especially if you start calculating them or using them in graphs.
  • There are better options for working on huge datasets like this, Origin, MatLab, and DIAdem are a few that I know of and have used so far, a free phyton based alternative would be Spyder.
L8n
  • 728
  • 1
  • 5
  • 15
  • Good thinking +1 ...and further to your point, it *might* be even more efficient to read the entire file at once, for example with `Set objFSO = CreateObject("Scripting.FileSystemObject") : Set objTF = objFSO.OpenTextFile(fName, 1) : strIn = objTF.ReadAll` ...etc, and then using `Split` to get each of the 3 sets of data. I tested with a 3.5m line JSON file and it took a really long time for all of these methods (however my lines are "full")... If time/performance is an issue @cepreu, it might be best to stick with the batch file that you were using... just call it from VBA with `Shell`. – ashleedawg Aug 22 '18 at 09:30
  • @L8n, Thanks for ur help too, I use ur code and the data is inserted only 16960 cells, and up to 1000000 all cells come with the value # N / D, And this is repeated on each sheet – Cepreu Aug 22 '18 at 14:32
  • @Cepreu I assume the `transpose` function is the culprit here, I'll update as soon as I know more – L8n Aug 22 '18 at 14:51
-1

Here is an alternative solution - save this code as a bat file and it will split your file in the amount of mbs you select, this will enable you to modify the data in Excel and create 3 * 1m worksheets:

@REM Using How to run a PowerShell script from a batch file @REM and How can I split a text file using PowerShell?

@PowerShell  ^

$upperBound =  100MB;  ^
$rootName = 'ENTER YOUR FILE PATH HERE. csv file';  ^
$from = $rootName;  ^
$fromFile = [io.file]::OpenRead($from);  ^
$buff = new-object byte[] $upperBound;  ^
$count = $idx = 0;  ^
try {  ^
    do {  ^
        'Reading ' + $upperBound;  ^
        $count = $fromFile.Read($buff, 0, $buff.Length);  ^
        if ($count -gt 0) {  ^
            $to = '{0}.{1}' -f ($rootName, $idx);  ^
            $toFile = [io.file]::OpenWrite($to);  ^
            try {  ^
                'Writing ' + $count + ' to ' + $to;  ^
                $tofile.Write($buff, 0, $count);  ^
            } finally {  ^
                $tofile.Close();  ^
            }  ^
        }  ^
        $idx ++;  ^
    } while ($count -gt 0);  ^
}  ^
finally {  ^
    $fromFile.Close();  ^
}  ^
%End PowerShell%

Note the code will split the data within cells, thus you will need to fix the first and last row of each of the files that are produced. I understand this not a VBA workaround, but it will work... Modify the upperbound to change the file size, i.e., 50mb, etc.

Lowpar
  • 897
  • 10
  • 31
  • 1
    nice PS; I'll steal that for my own purposes, but the OP said they are [already using a batch file](https://stackoverflow.com/questions/51960686/3-million-lines-of-port-data-in-excel-from-txt/51961328?noredirect=1#comment90874197_51961328) to split into multiple files; I think the goal here is to get the split data **into Excel** (using VBA). – ashleedawg Aug 22 '18 at 09:35