0

I have a Pandas dataframe that contains two type of columns, the ones that contain numpy arrays and the ones that contain floats.

E.g.:

      arr1   f1
[0.3, 1.3]  3.5

I need to split the values of the arrays into rows.

If I only have these two columns, I can easily split the array like this:

df = pd.DataFrame([[x, j] for i, j in zip(a['arr1'], a['f1']) for x in list(i)], columns=['arr1', 'df1])

And the result looks like this:

arr1   f1
0.3   3.5
1.3   3.5

However, I can have any number of array and float columns, where the arrays in the same row always have the same length, i.e.: I can match all the elements of both arrays ((a1, b1), (a2, b2), ..., (a_n, b_n)).

      arr1   f1        arr2  f2
[0.3, 1.3]  3.5 [14.8, 1.2] 1.6

After splitting the arrays, the result should look like this:

arr1   f1  arr2  f2
 0.3  3.5  14.8 1.6
 1.3  3.5  1.2  1.6

It is easy to know what kind of data a column has, so that I can use zip accordingly, but anyway I can't figure out how to make it work for any number of columns.

6659081
  • 381
  • 7
  • 21

2 Answers2

1

Solutions if order of columns is important - list comprehensions with DataFrame.explode:

dfs = [df[list(x)].explode(x[0]) for x in zip(df.columns[::2], df.columns[1::2])]

df = pd.concat(dfs, axis=1).reset_index(drop=True)
print (df)
  arr1   f1  arr2   f2
0  0.3  3.5  14.8  1.6
1  1.3  3.5   1.2  1.6

And your solution:

dfs = [pd.DataFrame([[a, j] for i, j in zip(df[x[0]], 
                                            df[x[1]]) for a in i], columns=x) 
                    for x in zip(df.columns[::2], df.columns[1::2])]

df = pd.concat(dfs, axis=1)
print (df)
   arr1   f1  arr2   f2
0   0.3  3.5  14.8  1.6
1   1.3  3.5   1.2  1.6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Let us try unnesting

df = unnesting(df, ['arr1','arr2'] , axis=1)

def unnesting(df, explode, axis):
    if axis==1:
        idx = df.index.repeat(df[explode[0]].str.len())
        df1 = pd.concat([
            pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
        df1.index = idx

        return df1.join(df.drop(explode, 1), how='left')
    else :
        df1 = pd.concat([
                         pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234