0

I currently have a bunch of huge CSV files on my server (one of them is over 3 GB) that I need to parse through and show on a template. Since this looks like JavaScript stuff, I looked into PapaParse but it seems that I have to pass in a File object to the parse function of PapaParse. The Mozilla link for that File object says that those kinds of objects are created when a user uploads some file, or from the HTML5 Canvas element. But I don't want that. The file is already on my server, and I just need to read through it and display the contents in a tabular format.

I tried the manual approach of simply parsing through the entire file from Django and maybe passing it to an AJAX callback in the template but the browser froze, and I had to restart the server.

Sidharth Samant
  • 714
  • 8
  • 28
  • You won't be able to handle a 3GB file in its entirety this way. Think about it: To do that your Django code has to consume the entire 3GB of data, do whatever it must do with it (probably in memory, unless you're doing anything to prevent that), then send the entire 3GB to the client, where it must again be processed in memory. It should be no surprise that the browser froze! I very much doubt that you need to display the entire 3GB of data at once. What is your actual goal here? – ChrisGPT was on strike Aug 07 '17 at 12:41
  • @Chris - the goal is to show the contents in a table. 50 rows at a time maybe. – Sidharth Samant Aug 07 '17 at 13:07

2 Answers2

1

My approach would be to load the CSV files into a database and then have a pagination view do the heavy-lifting (https://docs.djangoproject.com/en/1.11/topics/pagination/)

James
  • 3,597
  • 11
  • 41
  • 76
  • That sounds like it will work. But is there any limit to how much the database can take? There may be thousands of GB-sized CSV files in the future. – Sidharth Samant Aug 07 '17 at 14:33
  • No, there is no limit. However it depends whether you will be storing the CSV on separate tables or not. If you will be storing everything in one table, then I suggest to implement partitioning strategies. Both MySQL and PostgreSQL support partitioning. In both cases, make sure you implement indexing for fast retrieving. – James Aug 07 '17 at 14:45
  • I have another problem. Let's say there are 3 values in each row in the CSV file. So I'd have 3 columns in the database table. But the problem is that the CSV files all have different number of columns. I can't possible create new tables for all of them. – Sidharth Samant Aug 07 '17 at 14:45
  • In that case you might want to use MongoDB as it is suited exactly for this kind of scenario. – James Aug 07 '17 at 14:46
  • Okay, can you please point me to the specific feature of MongoDB that makes this possible? – Sidharth Samant Aug 07 '17 at 14:49
  • I haven't ever used MongoDB before (and haven't worked much on databases, in general) and I am finding it hard to figure out exactly which feature of MongoDB would help me out. – Sidharth Samant Aug 07 '17 at 14:57
0

You should read the csv file via generators for memory optimization.

You can follow in via this link

Moe Far
  • 2,742
  • 2
  • 23
  • 41