I have a csv file that contain only three columns but more than 200K rows. I want to split the csv file to multiple csv files depend on the 2nd columns (time columns) so same columns in each file but less rows (depend on my specification). I want the time duration to be variable, like I can put 10 second reading to every single file or 15 second or 19 second. I tried a couple of codes to split the csv file but I was not successful as I am very new to python.
The input csv file will be like this:
Col 0 Col 1 Col 2 Col 3
Data YYY 12:40:05 Data XXX
Data YYY 12:40:06 Data XXX
Data YYY 12:40:07 Data XXX
Data YYY 12:40:08 Data XXX
Data YYY 12:40:09 Data XXX
Data YYY 12:40:10 Data XXX
Data YYY 12:40:11 Data XXX
Data YYY 12:40:12 Data XXX
Data YYY 12:40:13 Data XXX
The output csv file, I want to be: file1
Col 0 Col 1 Col 2 Col 3
Data YYY 12:40:05 Data XXX
Data YYY 12:40:06 Data XXX
Data YYY 12:40:07 Data XXX
file2
Col 0 Col 1 Col 2 Col 3
Data YYY 12:40:08 Data XXX
Data YYY 12:40:09 Data XXX
Data YYY 12:40:10 Data XXX
file3
Col 0 Col 1 Col 2 Col 3
Data YYY 12:40:11 Data XXX
Data YYY 12:40:12 Data XXX
Data YYY 12:40:13 Data XXX
and so on till the end (the variable in above is equal to 3 seconds). My python code is:
import csv
from datetime import datetime
fieldnames = ['Col 0', 'Hour', 'Minute' , 'Second', 'Col 2' , 'Col 3']
files = {}
writers = {}
seconds = []
with open('4_Columns_PRi_Output.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
output_row = {}
output_row['Col 0'] = row['Col 0']
change_date = datetime.strptime(row['Col 1'].split(',')[0], '%H:%M:%S')
output_row['Hour'] = change_date.strftime('%H')
output_row['Minute'] = change_date.strftime('%M')
sec = change_date.strftime('%S')
output_row['Second'] = sec
if sec not in seconds:
output_file = open('corrected'+str(sec)+".csv", 'w')
writer = csv.DictWriter(output_file, fieldnames=fieldnames,lineterminator='\n')
writer.writeheader()
files[sec] = output_file
writers[sec] = writer
seconds.append(sec)
else:
output_file = open('corrected'+str(sec)+".csv", 'w+')
writer = csv.DictWriter(output_file, fieldnames=fieldnames,lineterminator='\n')
output_row['Col 2'] = row['Col 2']
output_row['Col 3'] = row['Col 3'].strip()
writers[sec].writerow(output_row)
for key in files:
files[key].close()
Your help will be highly appreciated.