-2

I am currently in the process of getting the data from my stakeholder where he has a database from which he is going to extract as a csv file.

From there he is going to upload in shared drive and I am going to pick up the data probably download the data and use that a source locally to import in pandas dataframe.

The approximate size will be 40 million rows, I was wondering if the data can be exported as a single csv file from SQL database and that csv can be used as a source for python dataframe or should it be in chunks as I am not sure what the row limitation of csv file is.

I don't think so ram and processing should be an issue at this time.

Your help is much appreciated. Cheers!

  • *approximate size will be 40 million rows* what is size do you except in terms of number of bytes? How much RAM you have available? How will you processing said data later? – Daweo Jul 06 '22 at 07:59
  • Does this answer your question? [Lazy Method for Reading Big File in Python?](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) – The Pjot Jul 06 '22 at 08:00
  • @Daweo Thanks for the comment, I have updated my question for clarity. – StupendousEnzio Jul 06 '22 at 08:29

2 Answers2

0

If you can't connect directly to the database, you might need the .db file. I'm not sure a csv will even be able to handle more than a million or so rows.

Skay
  • 1
  • 2
    What do you mean by _"I'm not sure a csv will even be able to handle more than a million or so rows."_? (I have worked with csv-files with about half a billion rows.) – Timus Jul 06 '22 at 09:38
  • This, I just wanted to know if a single dump of 40 million rows can be considered rather than breaking the files and getting it to shared drive. – StupendousEnzio Jul 06 '22 at 11:20
0

as I am not sure what the row limitation of csv file is.

There is not such limit inherent for CSV format, if you understood CSV as format defined by RFC4180 which stipulates that CSV file is

file = [header CRLF] record *(CRLF record) [CRLF]

where [...] denote optional part, CRLF denote carriagereturn-linefeed (\r\n) and *(...) denote part repeated zero or more times.

Daweo
  • 31,313
  • 3
  • 12
  • 25