Which is the most efficient way of flattening down a pandas dataframe?

Question

I have a large pandas dataframe with 8 columns and several NaN values:

0   1   2   3   4   5   6   7   8
1   Google, Inc. (Date 11/07/2016)  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   Apple Inc. (Date 07/01/2016)    Amazon (Date 11/01/2016)    NaN     NaN     NaN     NaN     NaN     NaN     NaN
3   IBM, Inc. (Date 11/08/2016)     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
4   Microsoft (Date 11/10/2016)     Google, Inc. (Date 11/10/1990)  Google, Inc. (Date 11/07/2016)  Samsung (Date 05/02/2016)   NaN     NaN     NaN     NaN     NaN

How can I flatten down it like this:

0   companies
1   Google, Inc. (Date 11/07/2016)
2   Apple Inc. (Date 07/01/2016)
3   Amazon (Date 11/01/2016)
4   IBM, Inc. (Date 11/08/2016)
5   Microsoft (Date 11/10/2016)
6   Google, Inc. (Date 11/10/1990)
7   Google, Inc. (Date 11/07/2016)
8   Samsung (Date 05/02/2016)

I read the docs and tried:

df.iloc[:,0]

The problem is that I lost information and order over the other columns. I idea of how to flat without lost data in the other cells and order?.

score 2 · Accepted Answer · answered Nov 13 '16 at 21:25

2

You can stack the columns and optionally reset the index. By default, stack drops NaN's.

df.stack()
Out: 
0  0    Google, Inc. (Date 11/07/2016) 
1  0      Apple Inc. (Date 07/01/2016) 
   1          Amazon (Date 11/01/2016) 
2  0       IBM, Inc. (Date 11/08/2016) 
3  0       Microsoft (Date 11/10/2016) 
   1    Google, Inc. (Date 11/10/1990) 
   2    Google, Inc. (Date 11/07/2016) 
   3         Samsung (Date 05/02/2016) 
dtype: object

df.stack().reset_index(drop=True)
Out: 
0    Google, Inc. (Date 11/07/2016) 
1      Apple Inc. (Date 07/01/2016) 
2          Amazon (Date 11/01/2016) 
3       IBM, Inc. (Date 11/08/2016) 
4       Microsoft (Date 11/10/2016) 
5    Google, Inc. (Date 11/10/1990) 
6    Google, Inc. (Date 11/07/2016) 
7         Samsung (Date 05/02/2016) 
dtype: object

answered Nov 13 '16 at 21:25

ayhan

70,170
20
182
203

Thanks for the help. and what about if I am interested in preserve the nan spaces into the stack?.. should I do: `drop = False` – student Nov 13 '16 at 21:31
1

That drop is for dropping the index. Instead you should use `df.stack(dropna=False)` to preserve NaNs. – ayhan Nov 13 '16 at 21:32
thanks, I got: `AttributeError: 'Series' object has no attribute 'stack'` – student Nov 13 '16 at 21:35
1

Do you try it on the original DataFrame you posted here? In order to get that error you should have called that method on a Series. – ayhan Nov 13 '16 at 21:51
Yes, this must be in the original. – student Nov 13 '16 at 21:52

score 1 · Answer 2 · edited May 23 '17 at 12:24

1

This probably do the trick:

df = pd.DataFrame([
        ["Google, Inc. (Date 11/07/2016)", float("NaN")], 
        ["Apple Inc. (Date 07/01/2016)", "Amazon (Date 11/01/2016)"]])
unstacked = df.T.unstack()
unstacked.dropna(inplace=True)
unstacked.reset_index(drop=True, inplace=True)
unstacked

Output:

0    Google, Inc. (Date 11/07/2016)
1      Apple Inc. (Date 07/01/2016)
2          Amazon (Date 11/01/2016)
dtype: object

P.S. Please, take a look at this question on providing good pandas examples in questions.

edited May 23 '17 at 12:24

Community

1
1

answered Nov 13 '16 at 21:25

Ilya V. Schurov

7,687
2
40
78

It seems that @ayhan's answer is better. – Ilya V. Schurov Nov 13 '16 at 21:28
Thanks for the help. and what about if I am interested in preserve the nan spaces into the stack?.. should I do: `drop = False`? – student Nov 13 '16 at 21:32
1

You have to remove line `unstacked.dropna(inplace=True)`. – Ilya V. Schurov Nov 13 '16 at 22:15

Which is the most efficient way of flattening down a pandas dataframe?

2 Answers2