0

I have given a task to read particular n lines of the data (CSV file) and delete them. Let's say given timestamp, age, gender of users ( in CSV format) and I want to read by 10 lines to avoid overloading. But I have no idea how to do it. Can you suggest to me how to solve this problem efficiently? Thanks

ForceBru
  • 43,482
  • 10
  • 63
  • 98
  • Have a look into this post, https://stackoverflow.com/a/33646592/4985099 – sushanth Jun 23 '21 at 15:36
  • Does this answer your question? [Lazy Method for Reading Big File in Python?](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python) – Luke Storry Jun 23 '21 at 15:36

1 Answers1

1
import csv
import pandas as pd

# reads your file in chunks (lines of 10)
chunk_iterator = pd.read_csv('sample.csv', chunksize=10)

# rewrite over file to 'delete' the unwanted lines
with open('sample.csv', 'w') as outf:
    writer = csv.writer(outf)
    for line in chunk_iterator:
        # if condition satisfied, write to new file
        if (<insert condition>):
            writer.writerow(line)

The above efficiently reads in the csv file in chunks using pandas. It then rewrites over the same file, thus 'deleting' the rows that don't meet the particular condition you set. (You did not specify what this condition was, so it was not included).

If you want to keep to original file and write the output to a different file, change the file name in the outfile line like: with open('updated_sample.csv', 'w') as outf: