0

This follows a question I posed earlier.

How do I convert the following lines:

   time1,stockA,bid,1
   time2,stockA,ask,1.1
   time3,stockB,ask,2.1
   time4,stockB,bid,2.0
   time5,stockA,bid,1.1
   time6,stockA,ask,1.2
   time7,stockA,high,1.5
   time8,stockA,low,0.5

to the following panda dataframe:

  time     stock       bid    ask    high    low
  time1    stockA      1      
  time2    stockA             1.1
  time3    stockB             2.1
  time4    stockB      2.0    
  time5    stockA      1.1
  time6    stockA             1.2
  time7    stockA                     1.5
  time8    stockA                            0.5

Any help is appreciated!

Community
  • 1
  • 1
qts
  • 984
  • 2
  • 14
  • 25

2 Answers2

2

What you want to do is pivot the table. The following method results in the time and stock forming a MultiInde

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

I think that's what the DataFrame should look like. Then do

In [1064]:

df.unstack()
Out[1064]:
                prices
type            ask bid high low
time    stock               
time1   stockA  NaN 1.0 NaN NaN
time2   stockA  1.1 NaN NaN NaN
time3   stockB  2.1 NaN NaN NaN
time4   stockB  NaN 2.0 NaN NaN
time5   stockA  NaN 1.1 NaN NaN
time6   stockA  1.2 NaN NaN NaN
time7   stockA  NaN NaN 1.5 NaN
time8   stockA  NaN NaN NaN 0.5

You can fill the Nans with whatever you prefer using df.fillna. Generally speaking, converting a columns values into column headers is called pivoting. .unstack pivots a level of a MultiIndex. You can check .pivot as well. You can do

df.columns = df.columns.droplevel(0)

To get rid of the outer level in the columns which contains 'prices' for each column

JoeCondron
  • 8,546
  • 3
  • 27
  • 28
  • Interesting idea. But when I try it I get: `ValueError: Index contains duplicate entries, cannot reshape.` However it works when I add the some headers to the original data file, e.g. `datetime,stock,type,prices`. – qts May 13 '15 at 14:01
  • You should be able to add those headers when reading the file using the ```names``` parameter as I have in my code. Anyway, I'm glad you got it to work. – JoeCondron May 14 '15 at 07:58
  • I have to physically add the header as well as use the `names` parameter for it to work, which means I end up with a redundant row header. I have no idea why this is. It seems both unstack and pivot have an issue with 'duplicate entries' as I've discovered here http://stackoverflow.com/questions/11232275/pandas-pivot-warning-about-repeated-entries-on-index. After that I came across the `pivot_table` function which seems to work. – qts May 14 '15 at 08:35
  • So, substituting `df.unstack()` with `pd.pivot_table(df,index=['time','stk'],columns='type',values='value')` in your solution finally worked for me. – qts May 14 '15 at 08:42
  • Yeah, which of pivot, pivot table and unstack you want to use will depend on how your data is structured and what you want to achieve. – JoeCondron May 14 '15 at 09:38
1

My approach would be to read the csv into 2 df's, one with and without the bid ask columns:

In [99]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2
 time7,stockA,high,1.5
 time8,stockA,low,0.5"""
​
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time', 'stock'])
df
Out[99]:
     time   stock
0   time1  stockA
1   time2  stockA
2   time3  stockB
3   time4  stockB
4   time5  stockA
5   time6  stockA
6   time7  stockA
7   time8  stockA

For the second df, we can call pivot to rotate the df to create the columns from the 'bid' values, we need to reset the index and then we can merge the 2 df's together to get your desired result, you can replace the NaN values with blank strings if needed:

In [102]:

df_new = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'], usecols=['time','bid','ask'])
df_new = df_new.pivot(columns ='bid', values='ask', index='time')
df_new = df_new.reset_index()
df = df.merge(df_new)
df
Out[102]:
     time   stock  ask  bid  high  low
0   time1  stockA  NaN  1.0   NaN  NaN
1   time2  stockA  1.1  NaN   NaN  NaN
2   time3  stockB  2.1  NaN   NaN  NaN
3   time4  stockB  NaN  2.0   NaN  NaN
4   time5  stockA  NaN  1.1   NaN  NaN
5   time6  stockA  1.2  NaN   NaN  NaN
6   time7  stockA  NaN  NaN   1.5  NaN
7   time8  stockA  NaN  NaN   NaN  0.5
EdChum
  • 376,765
  • 198
  • 813
  • 562