0

I get excel files during Hackathons or Kaggle competitions where the size is in Gb. My 8gb i7 intel system crashes if i open it directly or load it in python or R. I am wondering if there is any way where i can split the file without opening it

Charles Merriam
  • 19,908
  • 6
  • 73
  • 83
  • In Python, I believe so. In Excel, no. I would add the tags to the languages you are open to use for a solution here. The python people are not seeing this post since it's only tagged with Excel – urdearboy Nov 27 '19 at 16:35
  • Open a new workbook and use SQL queries in vba to split it. https://stackoverflow.com/questions/27385245/using-excel-vba-to-run-sql-query – Andreas Nov 27 '19 at 16:48
  • if you are using unix so there's `built-in` `split` http://man7.org/linux/man-pages/man1/split.1.html – αԋɱҽԃ αмєяιcαη Nov 28 '19 at 06:14

1 Answers1

0
 Splits a CSV file into multiple pieces.

A quick bastardization of the Python CSV library.

Arguments:

    `row_limit`: The number of rows you want in each output file. 10,000 by default.
    `output_name_template`: A %s-style template for the numbered output files.
    `output_path`: Where to stick the output files.
    `keep_headers`: Whether or not to print the headers in each output file.

Example usage:

    >> from toolbox import csv_splitter;
    >> csv_splitter.split(open('/home/ben/input.csv', 'r'));
reader = csv.reader(filehandler, delimiter=delimiter)
current_piece = 1
current_out_path = os.path.join(
     output_path,
     output_name_template  % current_piece
)
current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
current_limit = row_limit
if keep_headers:
    headers = next(reader)
    current_out_writer.writerow(headers)
for i, row in enumerate(reader):
    if i + 1 > current_limit:
        current_piece += 1
        current_limit = row_limit * current_piece
        current_out_path = os.path.join(
           output_path,
           output_name_template  % current_piece
        )
        current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
        if keep_headers:
            current_out_writer.writerow(headers)
    current_out_writer.writerow(row)