0

I have to use Excel for data analysis of .csv files of around 80-100mb. In order to speed up the process I'm trying to create a tool to automate some of this process but being as it deals with large amounts of data (approx. 500k rows) it's incredibly slow when it comes to trying to read the file initially.

Does anybody have any suggestions that I could use? I'm currently using a button for the user to select and load the file. I've put my code below for that.

Dim ws As Worksheet, fileNm As String, fileName As String

Dim FSO As Scripting.FileSystemObject

Set FSO = New FileSystemObject

Set ws = ActiveWorkbook.Sheets("CSV Data")

fileNm = Application.GetOpenFilename("CSV Files  (*.csv),*.csv", , "Please select CSV file...")

If fileNm = "False" Then

    Exit Sub

Else

With ws.QueryTables.Add(Connection:="TEXT;" & fileNm, Destination:=ws.Range("A1"))
    .TextFileParseType = xlDelimited
    .TextFileCommaDelimiter = True
    .Refresh
End With

End If
Matt Griffin
  • 75
  • 4
  • 12

2 Answers2

1

I suggest you to look into PowerQuery. It is designed for the purpose of loading and manipulationg large data to Excel.

Go to the Data ribbon and click "Get Data"-->"From File"-->"From Text/CSV"

In the upcoming window select your file anc click "Import". In the new window you have the option to "Load", which will import the data as is, or "Edit", which you should do.

Now you are in the PowerQuery Editor, where you can manipulate your data e.g. Rename columns, delete/add columns, change type or mathematical operations.

For an overview of what power query can and how it works go to this description from https://www.excelcampus.com, which in general has great tutorials.

For Power Query go here

FloLie
  • 1,820
  • 1
  • 7
  • 19
1

At least as far as loading the data into Excel is concerned, if you don't need a live connection to the data, it might be more efficient to do the following:

  1. Open an ADO connection to the text file
  2. Execute an SQL statement against the text file, returning an ADO recordset. You can also include only specific columns, or otherwise reshape the data (filtering, grouping, ordering) with the SQL statement before returning it
  3. Use Excel's CopyFromRecordset method to paste the data into the worksheet

Something similar to this; even though the source is another Excel workbook, the principle is the same.

Regarding the other issues you mentioned:

  • deleting columns -- Your SQL statement could only select the columns that you need
  • coloring certain rows -- Perhaps conditional formatting might be more efficient?
Zev Spitz
  • 13,950
  • 6
  • 64
  • 136
  • Thanks Zev. I don't need a live connection to the data really as they're just a collection of .csv files generated from exports and not in a live database as such. Can't believe I forgot about conditional formatting though! Definitely be using that. – Matt Griffin Apr 24 '18 at 12:39
  • @MattGriffin Not speaking from experience, but conditional formatting might consume a lot of CPU when scrolling up and down. If that is the case, I would try to first set the conditional formatting, and then somehow paste the resulting formatting into all the cells. – Zev Spitz Apr 24 '18 at 12:59