I have following situation where i may get 300 columns in csv file and some of them are list parameter (50 columns)and they can uneven size including no values.
Time COL1 COL2
2020-03-13 10:43:00.500 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0
2020-03-13 10:43:00.900 10.0 20.0 30.0 70.0 10.0 20.0
2020-03-13 10:44:00.100 10.0 20.0 30.0 70.0
I want do
1. explode the column based on highest frequency to rows.ex: 1st row max freq(space separated values)
is 5 and in 2nd row it is 4
Time COL1 COL2
2020-03-13 10:43:00.500 0.0 50.0
2020-03-13 10:43:00.580 10.0 60.0
2020-03-13 10:43:00.660 20.0 70.0
2020-03-13 10:43:00.740 30.0 80.0
2020-03-13 10:43:00.820 40.0 90.0
2020-03-13 10:43:00.900 10.0 10.0
2020-03-13 10:43:00.1150 20.0 20.0
2020-03-13 10:43:00.1400 30.0
2020-03-13 10:43:00.1650 80.0
2020-03-13 10:44:00.100 10.0
2020-03-13 10:44:00.350 20.0
2020-03-13 10:44:00.600 30.0
2020-03-13 10:44:00.850 70.0
and Time filed should be between two subsequent row time. something like this.
curr_row_time = data_frame['Time'][ind1]
next_row_time = data_frame['Time'][ind1+1] timestamp1 = datetime.datetime.strptime(str(curr_row_time ), '%Y-%m-%d %H:%M:%S.%f') timestamp2 = datetime.datetime.strptime(str(next_row_time), '%Y-%m-%d %H:%M:%S.%f') time_delta = (timestamp2 - timestamp1).total_seconds() * 1000 time_step_increment = time_delta / max_frequency (should be max for a row) for last row max frequency can be taken either avg of it or last used max frequency Appreciate if any suggestion for point 1 however 2nd point is on second priority but its good to have that also.
For point 1 i have used below
I tried solution mentioned in the below links by converting space separated to list 10.0 20.0 30.0 70.0 ----> [10.0, 20.0, 30.0, 70.0] Pandas Explode on Multiple columns
File "<stdin>", line 3, in <listcomp>
File "<__array_function__ internals>", line 6, in concatenate
ValueError: zero-dimensional arrays cannot be concatenated
Below links works for equal size list column Split (explode) pandas dataframe string entry to separate rows