I have file with 50 GB data. I know how to use Pandas for my data analysis.
I am only in need of the large 1000 lines or rows and in need of complete 50 GB.
Hence, I thought of using the nrows
option in the read_csv()
.
I have written the code like this:
import pandas as pd
df = pd.read_csv("Analysis_of_50GB.csv",encoding="utf-16",nrows=1000,index_col=0)
But it has taken the top 1000 rows. I am in need of the last 100 rows. So I did this and received error:
df = pd.read_csv("Analysis_of_50GB.csv",encoding="utf-16",nrows=-1000,index_col=0)
ValueError: 'nrows' must be an integer >=0
I have even tried using the chunksize
in the read_csv()
. But it still loads the complete file. And even the output was not DataFrame
but iterables
.
Hence, please let me know what I can in this scenario.
Please NOTE THAT I DO NOT WANT TO OPEN THE COMPLETE FILE...