41

I have a Series that looks like this:

1999-03-31  SOLD_PRICE     NaN
1999-06-30  SOLD_PRICE     NaN
1999-09-30  SOLD_PRICE     NaN
1999-12-31  SOLD_PRICE    3.00
2000-03-31  SOLD_PRICE    3.00

with an index that looks like:

MultiIndex
[(1999-03-31 00:00:00, u'SOLD_PRICE'), (1999-06-30 00:00:00, u'SOLD_PRICE'), 
 (1999-09-30 00:00:00, u'SOLD_PRICE'), (1999-12-31 00:00:00, u'SOLD_PRICE'),...]

I don't want the second column as an index. Ideally I'd have a DataFrame with column 1 as "Date" and column 2 as "Sales" (dropping the second level of the index). I don't quite see how to reconfigure the index.

cottontail
  • 10,268
  • 18
  • 50
  • 51
dartdog
  • 10,432
  • 21
  • 72
  • 121
  • 6
    Just a general comment: naming your variables with numbers indicating the order in which you're using them is a recipe for disaster. Consider the pathological case where `pandas` didn't have `DataFrame` or `Series` and they were just called `DataStructure1` and `DataStructure2` and all of their methods were just called `method1`, `method2`, and so on. Giving your variables meaningful names will make your code legible to others reading it for the first time. Giving meaningful names to your variables also allows *you* to come back to your code in a month and get back up to speed quickly. – Phillip Cloud Sep 04 '13 at 21:42

4 Answers4

61

Just call reset_index():

In [130]: s
Out[130]:
0           1
1999-03-31  SOLD_PRICE   NaN
1999-06-30  SOLD_PRICE   NaN
1999-09-30  SOLD_PRICE   NaN
1999-12-31  SOLD_PRICE     3
2000-03-31  SOLD_PRICE     3
Name: 2, dtype: float64

In [131]: s.reset_index()
Out[131]:
            0           1   2
0  1999-03-31  SOLD_PRICE NaN
1  1999-06-30  SOLD_PRICE NaN
2  1999-09-30  SOLD_PRICE NaN
3  1999-12-31  SOLD_PRICE   3
4  2000-03-31  SOLD_PRICE   3

There are many ways to drop columns:

Call reset_index() twice and specify a column:

In [136]: s.reset_index(0).reset_index(drop=True)
Out[136]:
            0   2
0  1999-03-31 NaN
1  1999-06-30 NaN
2  1999-09-30 NaN
3  1999-12-31   3
4  2000-03-31   3

Delete the column after resetting the index:

In [137]: df = s.reset_index()

In [138]: df
Out[138]:
            0           1   2
0  1999-03-31  SOLD_PRICE NaN
1  1999-06-30  SOLD_PRICE NaN
2  1999-09-30  SOLD_PRICE NaN
3  1999-12-31  SOLD_PRICE   3
4  2000-03-31  SOLD_PRICE   3

In [139]: del df[1]

In [140]: df
Out[140]:
            0   2
0  1999-03-31 NaN
1  1999-06-30 NaN
2  1999-09-30 NaN
3  1999-12-31   3
4  2000-03-31   3

Call drop() after resetting:

In [144]: s.reset_index().drop(1, axis=1)
Out[144]:
            0   2
0  1999-03-31 NaN
1  1999-06-30 NaN
2  1999-09-30 NaN
3  1999-12-31   3
4  2000-03-31   3

Then, after you've reset your index, just rename the columns

In [146]: df.columns = ['Date', 'Sales']

In [147]: df
Out[147]:
         Date  Sales
0  1999-03-31    NaN
1  1999-06-30    NaN
2  1999-09-30    NaN
3  1999-12-31      3
4  2000-03-31      3
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
16

When you use double brackets, such as

H3 = H2[['SOLD_PRICE']]

H3 becomes a DataFrame. If you use single brackets,

H3 = H2['SOLD_PRICE']

then H3 becomes a Series. If H3 is a Series, then the result you desire follows naturally:

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=72, freq='M')
H2 = pd.DataFrame(np.arange(len(rng)), index=rng, columns=['SOLD_PRICE'])
H3 = H2['SOLD_PRICE']
H5 = H3.resample('Q', how='count')
H6 = pd.rolling_mean(H5,4)
print(H6.head())

yields

2011-03-31   NaN
2011-06-30   NaN
2011-09-30   NaN
2011-12-31     3
2012-03-31     3
dtype: float64
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
0

To remove a specific level from a MultiIndex, use droplevel. For example, to remove the second level:

H6 = H6.droplevel(1)

Then to convert the Series into a dataframe, use to_frame.

H6 = H6.to_frame('Sales')

res1


If you want to convert the Series into a dataframe with 2 columns, then droplevel() + reset_index() or two reset_index() calls with different arguments will do the job.

H6 = H6.droplevel(1).rename_axis('Date').reset_index(name='Sales')
# or 
H6 = H6.reset_index(level=0, name='Sales').reset_index(drop=True)

res2

cottontail
  • 10,268
  • 18
  • 50
  • 51
0

USING numpy.squeeze()

Having a multi-index dataframe :

import pandas as pd 

df = pd.DataFrame([[1,2], [2,3]], columns=[['col1', 'col2']])
print(df.columns)

MultiIndex([('col1',), ('col2',)], )

Then "squeeze" values with np.squeeze():

import numpy as np

val = np.squeeze(df.values).T
df_ = pd.DataFrame(val, [x[0] for x in df.columns]).T
print(df_.columns)

Index(['col1', 'col2'], dtype='object')

Tirbo06
  • 655
  • 6
  • 11