-1

I have many huge CSV files (2-4 GB), I just want to rewrite the headers of CSV files without loading the whole file.

What's the optimal way to do this? Headers on each file are different, just want to give header names as input parameter and join it with the body of the file.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Bauka
  • 21
  • 1
  • 8
  • Adding data to a file makes is necessary to re-write the file from there on. So if you add a header you have to read and write the whole file. – Klaus D. Jan 28 '21 at 12:55
  • 2
    if new line with headers has the same size - the same number of bytes - then you can simply overwrite it in file. For shorter you could try to add spaces to make it the same size. But if it needs more bytes then you will have to create new file. – furas Jan 28 '21 at 12:56
  • 1
    If you are going to use the csv in pandas later, you can simply change the headers of the loaded csv dataframe using pandas methods, rather than change the csv itself. – DapperDuck Jan 28 '21 at 12:58
  • https://stackoverflow.com/a/14947384/4440387 – Denis Rasulev Jan 28 '21 at 13:02
  • I have done assumption, probably reading first line would be more optimized, that is why my question starts with "Is there" ? The problem is how quickly the process will work out, the fastest way to transform the header. – Bauka Jan 28 '21 at 13:10
  • 1
    @DenisRasulev If you think this question has an answer somewhere else in this site - [flag it as duplicate](https://stackoverflow.com/help/privileges/flag-posts) instead of posting a link to an answer... – Tomerikoo Jan 28 '21 at 13:13
  • @Tomerikoo, thanks for the feedback, next time will do. Upvoted your comment. Cheers. – Denis Rasulev Jan 28 '21 at 16:52

1 Answers1

-1

Assuming your csv is a line by line file (Obviously), the file object reader already has this option :

for line in open('your_really_very_so_big_file.csv'):
    process_data(line)

However this for read only, if you wanna write you have to read the whole stream. However, one of the good tips for your case is a copy paste after editing, something like:

source.readline() 
destination.write(edited_line)
shutil.copyfileobj(source, destination)

EDIT :

Similar answer here: shutil.copyfileobj

Doc: shutil.copyfileobj

Younes
  • 391
  • 2
  • 9
  • I don't see how this answers the question which is *Is there a very optimal way __to rewrite headers__ of huge csv file* – Tomerikoo Jan 28 '21 at 13:06
  • 1
    So now you simply copied [this answer](https://stackoverflow.com/a/14947384/4440387)? What's `destination`? – Tomerikoo Jan 28 '21 at 13:11
  • 1
    The copy to a new file is better than reading/writing the whole stream in your code but still doesn't solve the problem of huge files(Sorry part of my answer wasn't loaded) – Younes Jan 28 '21 at 13:12
  • 1- Didn't see there was an already look alike answer 2- I 'm here just trying to help you with your problem 3- By destination i mean your destination csv file that you create – Younes Jan 28 '21 at 13:14
  • 1- this is not my question. 2- did you mean `destination.write(edited_line)`? – Tomerikoo Jan 28 '21 at 13:31
  • yes, i've edited it, thanks – Younes Jan 28 '21 at 13:37