I have a nested numpy.ndarray of the following format (each of the sublists has the same size)
len(exp_data) # Timepoints
Out[205]: 42
len(exp_data[0])
Out[206]: 1
len(exp_data[0][0]) # Y_bins
Out[207]: 13
len(exp_data[0][0][0]) # X_bins
Out[208]: 43
type(exp_data[0][0][0][0])
Out[209]: numpy.float64
I want to move these into a pandas DataFrame such that there are 3 columns numbered from 0 to N and the last one with the float value. I could do this with a series of loops, but that seems like a very non-efficient way of solving the problem.
In addition I would like to get rid of any nan values (not present in sample data). Do I do this after creating the df or is there a way to skip adding them in the first place?
NOTE: code below has been edited and I've added sample data
import random
import numpy as np
import pandas as pd
exp_data = [[[ [random.random() for x in range (5)],
[random.random() for x in range (5)],
[random.random() for x in range (5)],
]]]*5
exp_data[0][0][0][1]=np.nan
df = pd.DataFrame(columns = ['Timepoint','Y_bin','X_bin','Values'])
for t,timepoint in enumerate(exp_data):
for y,y_bin in enumerate(timepoint[0]):
for x,x_bin in enumerate(y_bin):
df.loc[len(df)] = [int(t),int(y),int(x),x_bin]
df = df.dropna().reset_index(drop=True)
The final format should be as follows (except I'd preferably like integers instead of floats in first 3 columns, but not essential; int(t)
etc. doesn't do the trick)
df
Out[291]:
Timepoint Y_bin X_bin Values
0 0.0 0.0 0.0 0.095391
1 0.0 0.0 2.0 0.963608
2 0.0 0.0 3.0 0.855735
3 0.0 0.0 4.0 0.392637
4 0.0 1.0 0.0 0.555199
5 0.0 1.0 1.0 0.118981
6 0.0 1.0 2.0 0.201782
...
len(df) # has received a total of 75 (5*3*5) input values of which 5 are nan
Out[293]: 70