1

I have a large pandas dataframe with 8 columns and several NaN values:

0   1   2   3   4   5   6   7   8
1   Google, Inc. (Date 11/07/2016)  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   Apple Inc. (Date 07/01/2016)    Amazon (Date 11/01/2016)    NaN     NaN     NaN     NaN     NaN     NaN     NaN
3   IBM, Inc. (Date 11/08/2016)     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
4   Microsoft (Date 11/10/2016)     Google, Inc. (Date 11/10/1990)  Google, Inc. (Date 11/07/2016)  Samsung (Date 05/02/2016)   NaN     NaN     NaN     NaN     NaN

How can I flatten down it like this:

0   companies
1   Google, Inc. (Date 11/07/2016)
2   Apple Inc. (Date 07/01/2016)
3   Amazon (Date 11/01/2016)
4   IBM, Inc. (Date 11/08/2016)
5   Microsoft (Date 11/10/2016)
6   Google, Inc. (Date 11/10/1990)
7   Google, Inc. (Date 11/07/2016)
8   Samsung (Date 05/02/2016)

I read the docs and tried:

df.iloc[:,0]

The problem is that I lost information and order over the other columns. I idea of how to flat without lost data in the other cells and order?.

student
  • 347
  • 3
  • 13

2 Answers2

2

You can stack the columns and optionally reset the index. By default, stack drops NaN's.

df.stack()
Out: 
0  0    Google, Inc. (Date 11/07/2016) 
1  0      Apple Inc. (Date 07/01/2016) 
   1          Amazon (Date 11/01/2016) 
2  0       IBM, Inc. (Date 11/08/2016) 
3  0       Microsoft (Date 11/10/2016) 
   1    Google, Inc. (Date 11/10/1990) 
   2    Google, Inc. (Date 11/07/2016) 
   3         Samsung (Date 05/02/2016) 
dtype: object

df.stack().reset_index(drop=True)
Out: 
0    Google, Inc. (Date 11/07/2016) 
1      Apple Inc. (Date 07/01/2016) 
2          Amazon (Date 11/01/2016) 
3       IBM, Inc. (Date 11/08/2016) 
4       Microsoft (Date 11/10/2016) 
5    Google, Inc. (Date 11/10/1990) 
6    Google, Inc. (Date 11/07/2016) 
7         Samsung (Date 05/02/2016) 
dtype: object
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Thanks for the help. and what about if I am interested in preserve the nan spaces into the stack?.. should I do: `drop = False` – student Nov 13 '16 at 21:31
  • 1
    That drop is for dropping the index. Instead you should use `df.stack(dropna=False)` to preserve NaNs. – ayhan Nov 13 '16 at 21:32
  • thanks, I got: `AttributeError: 'Series' object has no attribute 'stack'` – student Nov 13 '16 at 21:35
  • 1
    Do you try it on the original DataFrame you posted here? In order to get that error you should have called that method on a Series. – ayhan Nov 13 '16 at 21:51
  • Yes, this must be in the original. – student Nov 13 '16 at 21:52
1

This probably do the trick:

df = pd.DataFrame([
        ["Google, Inc. (Date 11/07/2016)", float("NaN")], 
        ["Apple Inc. (Date 07/01/2016)", "Amazon (Date 11/01/2016)"]])
unstacked = df.T.unstack()
unstacked.dropna(inplace=True)
unstacked.reset_index(drop=True, inplace=True)
unstacked

Output:

0    Google, Inc. (Date 11/07/2016)
1      Apple Inc. (Date 07/01/2016)
2          Amazon (Date 11/01/2016)
dtype: object

P.S. Please, take a look at this question on providing good pandas examples in questions.

Community
  • 1
  • 1
Ilya V. Schurov
  • 7,687
  • 2
  • 40
  • 78