4

I have a csv loaded using pandas as a dataframe. I make a update to only one record of the dataframe and write back to the original csv file. I know that in order to write back we have to write the whole dataframe back to csv.

like this:

df1.to_csv(file_name)

However, is there any where we can write back the updated values alone to the csv instead of writing the whole file?

Please advise

Doubt Dhanabalu
  • 457
  • 4
  • 8
  • 18
  • Why do you want to write the file after every update ? Can't you make some fix number of updates and then write the file? – YOLO Apr 04 '18 at 12:14
  • Possible duplicate of [How to modify a text file?](https://stackoverflow.com/questions/125703/how-to-modify-a-text-file) – shivsn Apr 04 '18 at 12:14
  • @ManishSaraswat - It will get updated only once during a process and i need to write back only once in csv, But still, i assume that if there are millions of records, for updating one single value i will be using system resources of writing millions of records – Doubt Dhanabalu Apr 04 '18 at 12:17
  • You cannot do that with pandas. You need to keep track of affected lines and update them separately. That said, I think your actual solution is using a database. – ayhan Apr 04 '18 at 12:17
  • 2
    @ayhan there could be very real use case where you have to update a fraction of file and db can't be used there – Arpit Solanki Apr 04 '18 at 12:18
  • I recommend you look up HDFStore, e.g. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.HDFStore.append.html. This can be converted easily into a `pandas` dataframe in memory if required. – jpp Apr 04 '18 at 12:25
  • Possible duplicate of [Updating a specific row in csv file](https://stackoverflow.com/questions/41574037/updating-a-specific-row-in-csv-file) – rafaelc Apr 04 '18 at 12:26
  • @ArpitSolanki I highly doubt that there is a use case where you have tabular data so you can use pandas but you *cannot* use a db. – ayhan Apr 04 '18 at 12:32
  • they can use a db if they don't want to processing, but in this case I am pretty sure that OP is doing some data processing step by step. You can look more at some frameworks like spark, dask etc if you are unsure about the usecase. @ayhan – Arpit Solanki Apr 04 '18 at 12:38
  • @ArpitSolanki pandas loads data into memory. So spark and dask are irrelevant here. Processing happens *after* you read data. You provide no case where you have to read data from a flat file instead of a db. – ayhan Apr 04 '18 at 12:44

0 Answers0