4

I have a pandas dataframe. It looks like this:

pd.DataFrame(data=np.arange(1,10).reshape(3,3), index=['A', 'B', 'C'], columns=['A', 'B', 'C'])

but has 100 rows and 100 columns.

I want to flatten it, so that it looks like this:

pd.DataFrame({'row' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], 'col' : ['A', 'B', 'C']*3, 'val' : np.arange(1,10)})

What's the most efficient way to do this?

Thanks,

Jack

3 Answers3

6

Option 1
Not 100% sure on efficiency, but the simplest way to do this is with df.melt.

df.rename_axis('row')\
  .reset_index()\
  .melt('row', value_name='val', var_name='col')\
  .sort_values(['row', 'col'])

  row col  val
0   A   A    1
3   A   B    2
6   A   C    3
1   B   A    4
4   B   B    5
7   B   C    6
2   C   A    7
5   C   B    8
8   C   C    9

Option 2
Another simple option with stack -

v = df.stack().reset_index()
v.columns=['row', 'col', 'val']
v

Or,

df.stack().rename_axis(['row', 'col']).reset_index(name='val')

  row col  val
0   A   A    1
1   A   B    2
2   A   C    3
3   B   A    4
4   B   B    5
5   B   C    6
6   C   A    7
7   C   B    8
8   C   C    9
cs95
  • 379,657
  • 97
  • 704
  • 746
3

Method #1

NumPy approach -

# https://stackoverflow.com/a/11146645/ @senderle
def cartesian_product(*arrays):
    la = len(arrays)
    dtype = np.result_type(*arrays)
    arr = np.empty([len(a) for a in arrays] + [la], dtype=dtype)
    for i, a in enumerate(np.ix_(*arrays)):
        arr[...,i] = a
    return arr.reshape(-1, la)

def flatten_dataframe(df):
    c = df.columns.values.astype(str)
    i = df.index.values.astype(str)
    df_out = pd.DataFrame(cartesian_product(i,c),columns=[['row','col']])
    df_out['val'] = df.values.ravel()
    return df_out 

Sample run -

In [598]: df
Out[598]: 
   A  B  C
P  1  2  3
Q  4  5  6
R  7  8  9

In [599]: flatten_dataframe(df)
Out[599]: 
  row col  val
0   P   A    1
1   P   B    2
2   P   C    3
3   Q   A    4
4   Q   B    5
5   Q   C    6
6   R   A    7
7   R   B    8
8   R   C    9

Method #2 : object dtype dataframe

If you are okay with object dtype dataframe, here's another way -

def flatten_dataframe_obj(df):
    c = df.columns.values.astype(str)
    i = df.index.values.astype(str)
    m = len(i)
    n = len(c)
    out = np.empty((m,n,3),dtype=object)
    out[...,0] = i[:,None]
    out[...,1] = c
    out[...,2] = df.values
    return pd.DataFrame(out.reshape(-1,3), columns=[['row','col','val']])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    Nice that I get to see some great numpy solutions to pandas questions every now and then. – cs95 Jan 13 '18 at 21:54
2

This should be fast :-)

new_df=pd.DataFrame({'row':df.index.repeat(df.shape[1]),'col':df.columns.tolist()*len(df),'val':np.concatenate(df.values)})
new_df
Out[343]: 
  col row  val
0   A   A    1
1   B   A    2
2   C   A    3
3   A   B    4
4   B   B    5
5   C   B    6
6   A   C    7
7   B   C    8
8   C   C    9
BENY
  • 317,841
  • 20
  • 164
  • 234