I have large csv file ~14gb (124 columns) and i occur memory error while reading df = pd.read_csv(r'C:\Users\AdamPer\Desktop\Python\Magisterka\test2.csv', encoding= "utf_8_sig")
I tried set option low_memory = False
and error_bad_lines = False
but it doesnt help, so i decided to set dtype and have problem with that.
What i've done.
I made a smaller csv file ~16mb and read it to dataframe and check types of column df.info(max_cols=200)
Soft 39347 non-null object
Hand_ID 39347 non-null int64
Table_Name 39345 non-null object
SmallBlind 39347 non-null float64
BigBlind 39347 non-null float64
Currency 39347 non-null object
Day 39347 non-null object
Hour 39347 non-null object
Seat_1 39347 non-null object
Seat_2 39347 non-null object
Seat_3 39347 non-null object
Seat_4 39347 non-null object
Seat_5 39347 non-null object
Seat_6 39347 non-null object
Stack_1 39347 non-null float64
Stack_2 39347 non-null float64
Stack_3 39347 non-null float64
Stack_4 39347 non-null float64
Stack_5 39347 non-null float64
Stack_6 39347 non-null float64
Raise_Pre_S1 39347 non-null object
Raise_Pre_S2 39347 non-null object
Raise_Pre_S3 39347 non-null object
Raise_Pre_S4 39347 non-null object
Raise_Pre_S5 39347 non-null object
Raise_Pre_S6 39347 non-null object
Call_Pre_S1 39347 non-null object
Call_Pre_S2 39347 non-null object
Call_Pre_S3 39347 non-null object
Call_Pre_S4 39347 non-null object
Call_Pre_S5 39347 non-null object
Call_Pre_S6 39347 non-null object
Flop_Bet_S1 39347 non-null float64
Flop_Bet_S2 39347 non-null float64
Flop_Bet_S3 39347 non-null float64
Flop_Bet_S4 39347 non-null float64
Flop_Bet_S5 39347 non-null float64
Flop_Bet_S6 39347 non-null float64
Flop_Raise_S1 39347 non-null object
Flop_Raise_S2 39347 non-null object
Flop_Raise_S3 39347 non-null object
Flop_Raise_S4 39347 non-null object
Flop_Raise_S5 39347 non-null object
Flop_Raise_S6 39347 non-null object
Flop_Call_S1 39347 non-null object
Flop_Call_S2 39347 non-null object
Flop_Call_S3 39347 non-null object
Flop_Call_S4 39347 non-null object
Flop_Call_S5 39347 non-null object
Flop_Call_S6 39347 non-null object
Saw_Flop_S1 39347 non-null int64
Saw_Flop_S2 39347 non-null int64
Saw_Flop_S3 39347 non-null int64
Saw_Flop_S4 39347 non-null int64
Saw_Flop_S5 39347 non-null int64
Saw_Flop_S6 39347 non-null int64
Turn_Bet_S1 39347 non-null float64
Turn_Bet_S2 39347 non-null float64
Turn_Bet_S3 39347 non-null float64
Turn_Bet_S4 39347 non-null float64
Turn_Bet_S5 39347 non-null float64
Turn_Bet_S6 39347 non-null float64
Turn_Raise_S1 39347 non-null object
Turn_Raise_S2 39347 non-null object
Turn_Raise_S3 39347 non-null object
Turn_Raise_S4 39347 non-null object
Turn_Raise_S5 39347 non-null object
Turn_Raise_S6 39347 non-null object
Turn_Call_S1 39347 non-null object
Turn_Call_S2 39347 non-null object
Turn_Call_S3 39347 non-null object
Turn_Call_S4 39347 non-null object
Turn_Call_S5 39347 non-null object
Turn_Call_S6 39347 non-null object
Saw_Turn_S1 39347 non-null int64
Saw_Turn_S2 39347 non-null int64
Saw_Turn_S3 39347 non-null int64
Saw_Turn_S4 39347 non-null int64
Saw_Turn_S5 39347 non-null int64
Saw_Turn_S6 39347 non-null int64
River_Bet_S1 39347 non-null float64
River_Bet_S2 39347 non-null float64
River_Bet_S3 39347 non-null float64
River_Bet_S4 39347 non-null float64
River_Bet_S5 39347 non-null float64
River_Bet_S6 39347 non-null float64
River_Raise_S1 39347 non-null object
River_Raise_S2 39347 non-null object
River_Raise_S3 39347 non-null object
River_Raise_S4 39347 non-null object
River_Raise_S5 39347 non-null object
River_Raise_S6 39347 non-null object
River_Call_S1 39347 non-null object
River_Call_S2 39347 non-null object
River_Call_S3 39347 non-null object
River_Call_S4 39347 non-null object
River_Call_S5 39347 non-null object
River_Call_S6 39347 non-null object
Saw_River_S1 39347 non-null int64
Saw_River_S2 39347 non-null int64
Saw_River_S3 39347 non-null int64
Saw_River_S4 39347 non-null int64
Saw_River_S5 39347 non-null int64
Saw_River_S6 39347 non-null int64
S1_shows? 39347 non-null int64
S2_shows? 39347 non-null int64
S3_shows? 39347 non-null int64
S4_shows? 39347 non-null int64
S5_shows? 39347 non-null int64
S6_shows? 39347 non-null int64
Winner?_S1 39347 non-null int64
Winner?_S2 39347 non-null int64
Winner?_S3 39347 non-null int64
Winner?_S4 39347 non-null int64
Winner?_S5 39347 non-null int64
Winner?_S6 39347 non-null int64
W/L_amount_S1 39347 non-null float64
W/L_amount_S2 39347 non-null float64
W/L_amount_S3 39347 non-null float64
W/L_amount_S4 39347 non-null float64
W/L_amount_S5 39347 non-null float64
W/L_amount_S6 39347 non-null float64
Pot 39347 non-null float64
Rake 39347 non-null float64
According to that i set dtypes:
dtypes = {'Soft': np.object,
'Hand_ID': np.int64,
'Table_Name': np.object,
'SmallBlind': np.float64,
'BigBlind': np.float64,
'Currency': np.object,
'Day': np.object,
'Hour': np.object,
'Seat_1': np.object, 'Seat_2': np.object, 'Seat_3': np.object, 'Seat_4': np.object, 'Seat_5': np.object, 'Seat_6': np.object,
'Stack_1': np.float64, 'Stack_2': np.float64, 'Stack_3': np.float64, 'Stack_4': np.float64, 'Stack_5': np.float64, 'Stack_6': np.float64,
'Raise_Pre_S1': np.object, 'Raise_Pre_S2': np.object, 'Raise_Pre_S3': np.object, 'Raise_Pre_S4': np.object, 'Raise_Pre_S5': np.object, 'Raise_Pre_S6': np.object,
'Call_Pre_S1': np.object, 'Call_Pre_S2': np.object, 'Call_Pre_S3': np.object, 'Call_Pre_S4': np.object, 'Call_Pre_S5': np.object, 'Call_Pre_S6': np.object,
'Flop_Bet_S1': np.float64, 'Flop_Bet_S2': np.float64, 'Flop_Bet_S3': np.float64, 'Flop_Bet_S4': np.float64, 'Flop_Bet_S5': np.float64, 'Flop_Bet_S6': np.float64,
'Flop_Raise_S1': np.object, 'Flop_Raise_S2': np.object, 'Flop_Raise_S3': np.object, 'Flop_Raise_S4': np.object, 'Flop_Raise_S5': np.object, 'Flop_Raise_S6': np.object,
'Flop_Call_S1': np.object, 'Flop_Call_S2': np.object, 'Flop_Call_S3': np.object, 'Flop_Call_S4': np.object, 'Flop_Call_S5': np.object, 'Flop_Call_S6': np.object,
'Saw_Flop_S1': np.int64, 'Saw_Flop_S2': np.int64, 'Saw_Flop_S3': np.int64, 'Saw_Flop_S4': np.int64, 'Saw_Flop_S5': np.int64, 'Saw_Flop_S6': np.int64,
'Turn_Bet_S1': np.float64, 'Turn_Bet_S2': np.float64, 'Turn_Bet_S3': np.float64, 'Turn_Bet_S4': np.float64, 'Turn_Bet_S5': np.float64, 'Turn_Bet_S6': np.float64,
'Turn_Raise_S1': np.object, 'Turn_Raise_S2': np.object, 'Turn_Raise_S3': np.object, 'Turn_Raise_S4': np.object, 'Turn_Raise_S5': np.object, 'Turn_Raise_S6': np.object,
'Turn_Call_S1': np.object, 'Turn_Call_S2': np.object, 'Turn_Call_S3': np.object, 'Turn_Call_S4': np.object, 'Turn_Call_S5': np.object, 'Turn_Call_S6': np.float64,
'Saw_Turn_S1': np.int64, 'Saw_Turn_S2': np.int64, 'Saw_Turn_S3': np.int64, 'Saw_Turn_S4': np.int64, 'Saw_Turn_S5': np.int64, 'Saw_Turn_S6': np.int64,
'River_Bet_S1': np.float64,'River_Bet_S2': np.float64,'River_Bet_S3': np.float64,'River_Bet_S4': np.float64,'River_Bet_S5': np.float64,'River_Bet_S6': np.float64,
'River_Raise_S1': np.object, 'River_Raise_S2': np.object,'River_Raise_S3': np.object, 'River_Raise_S4': np.object, 'River_Raise_S5': np.object, 'River_Raise_S6': np.object,
'River_Call_S1': np.object, 'River_Call_S2': np.object, 'River_Call_S3': np.object, 'River_Call_S4': np.object, 'River_Call_S5': np.object, 'River_Call_S6': np.object,
'Saw_River_S1': np.int64,'Saw_River_S2': np.int64,'Saw_River_S3': np.int64,'Saw_River_S4': np.int64,'Saw_River_S5': np.int64, 'Saw_River_S6': np.int64,
'S1_shows?': np.int64, 'S2_shows?': np.int64, 'S3_shows?': np.int64, 'S4_shows?': np.int64, 'S5_shows?': np.int64, 'S6_shows?': np.int64,
'Winner?_S1': np.int64, 'Winner?_S2': np.int64, 'Winner?_S3': np.int64, 'Winner?_S4': np.int64, 'Winner?_S5': np.int64, 'Winner?_S6': np.int64,
'W/L_amount_S1': np.float64, 'W/L_amount_S2': np.float64, 'W/L_amount_S3': np.float64, 'W/L_amount_S4': np.float64, 'W/L_amount_S5': np.float64, 'W/L_amount_S6': np.float64,
'Pot': np.float64,
'Rake': np.float64}
and try to read the same csv with this code:
df = pd.read_csv(r'C:\Users\AdamPer\Desktop\Python\Magisterka\test2.csv', encoding= "utf_8_sig", dtype=dtypes)
and it raise me an error:
ValueError: could not convert string to float: '[]'
Any ideas how to solve this? Link to smaller csv file