I have a dataframe with 2 columns which represent the X & Y dimension of a 2D array that needs to be created. The value in the 2D array will come from another column in the dataframe. In case there is more than 1 value column in the dataframe, then same number of 2D lists needs to be created.
So far I've been able to create 1 2D list assuming there is only 1 value column in the dataframe. I created an empty 2D list of the same dimension based on the X & Y columns in the dataframe. Then I loop through each row of the dataframe and fill the 2D list based on the value of X & Y in that row.
Example dataframe. X represent the 'columns' of the 2D list, and Y represents the 'rows' of the 2D list. In this case the 2D list is 3x2. The column (numeric_result, voltage) will fill up the 2D list.
| parent | child | numeric_result | X | Y |
index | | voltage |
0 | xy | a | 1.2 | 1 | 1 |
1 | xy | a | 1.1 | 2 | 1 |
2 | xy | a | 1.2 | 3 | 1 |
3 | xy | a | 1.1 | 1 | 2 |
4 | xy | a | 1.0 | 2 | 2 |
5 | xy | a | 1.3 | 3 | 2 |
First I create the 2D list:
rows = 2
cols = 3
def make2dList(rows, cols):
a=[]
for row in range(rows): a += [[0]*cols]
return a
list2d = make2dList(rows, cols)
Then I populate the list.
def fill2dlist(a, dataframe):
# Loop through each row of dataframe
for i in range(len(dataframe.index)):
col = int(dataframe.iloc[i].X)
row = int(dataframe.iloc[i].Y)
a[row-1][col-1] = (pd.to_numeric(dataframe.loc[i,'numeric_result'].values[0]))
return a
finallist = modify2dlist(list2d, dataframe)
print(finallist)
[[1.2, 1.1, 1.2],[1.1, 1.0, 1.3]]
This seems inefficient. Is there a way to vectorize this or somehow make it faster?
In addition, I want to make a new dataframe like below where there are many combination of parent & child. Any help would be appreciated on how to create this dataframe. Thanks!
| parent | child | numeric_result_list |
index | | voltage |
0 | xy | a | [[1.2, 1.1, 1.2], [1.1, 1.0, 1.3]] |
1 | xy | b | [[1.1, 1.0, 1.1], [1.4, 1.3, 1.5]] |
2 | xy | c | [[1.1, 1.0, 1.6], [1.4, 1.8, 1.5]] |
3 | yz | e | [[1.4, 1.2, 1.2], [1.7, 1.2, 1.0]] |
Edit here is my code to create the dataframe with the 2D list. Any help would be appreciated to make it efficient.
# Create an empty dataframe with column names
dffinal = pd.DataFrame(columns=['parent','child','numeric_result_list'])
# Group by 'parent' and 'child'
parent_child = df2.groupby(['parent', 'child'])
i = 1
for name, group in parent_child:
print('Processing: ', name)
group = group.reset_index(drop=True)
_array2d_ = make2dList(rows, cols)
_array2d_ = modify2dlist(_array2d_, _group_)
dffinal.loc[i] = [name[0], name[1], _array2d_]
i = i+1
print('done')
dff = dff.reset_index(drop=True)