I have a csv file which has no header columns and it has variable length records in each line.
Each record can go upto 398 fields and I want to keep only 256 fields in my dataframe.As I need only those fields to process.
Below is a slim version of the file.
1,2,3,4,5,6
12,34,45,65
34,34,24
In the above I would like to keep only 3 fields(analogous to 256 above) from each line while calling the read_csv.
I tried the below
import pandas as pd
df = pd.read_csv('sample.csv',header=None)
I get the following error as pandas taking the 1st to generate the metadata.
File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 10
Only solution I can think of is using
names = ['column1','column2','column3','column4','column5','column6']
while creating the data frame.
But for the real files which can be upto 50MB I don't want to do that as that is taking a lot of memory and I am trying to run it using aws lambda which will incur more cost. I have to process a large number of files daily.
My question is can I just create a dataframe using the slimmer 256 field while reading the csv alone? Can that be my step one ?
I am very new to pandas so kindly bear my ignorance. I tried to look for a solution for a long time but could find one.