Python: reorganize a dataframe with repeated values appearing in one column.

Question

I have a dataframe that looks like this:

     Instrument                  Date  Total Return
0  KYG2615B1014  2017-11-29T00:00:00Z      0.000000
1  KYG2615B1014  2017-11-28T00:00:00Z    -10.679612
2  KYG2615B1014  2017-11-27T00:00:00Z     -8.035714
3  JP3843250006  2017-11-29T00:00:00Z      0.348086
4  JP3843250006  2017-11-28T00:00:00Z      0.349301
5  JP3843250006  2017-11-27T00:00:00Z      0.200000

Given that dataframe, I would like to make it look like this:

              11/27/2017  11/28/2017  11/29/2017
KYG2615B1014   -8.035714  -10.679612    0.000000
JP3843250006    0.200000    0.349301    0.348086

Basically what I want is to place every date as a new column and inside that column, placing the corresponding value. I wouldn't say "filtering" or "deleting" duplicates, I'd say this is much more like rearranging.

Both dataframes were generated by me, but the thing is that to acquire this data I have to make a call to an API. In the 1st dataframe I make only one call and pull all of this data, while in the other I make one call per each date. So 1st is much more efficient than the 2nd and figured it was the right call, but I'm stuck in this part of reorganizing the dataframe to what I need.

I thought of creating an empty dataframe and then populate it, by picking indexes of repeated elements in the 'Instrument' column, use those indexes to get elements from the 'Total Return' column and then place the elements from that chunk of data accordingly, but I don't know how to do it.

If someone can help me, I'll be happy to know.

Not sure if useful at this point, but this how I generated the dataframe (before populating it) in the 2nd version:

import pandas as pd
import datetime

#Getting a list of dates
start=datetime.date(2017,11,27)
end=datetime.date.today() - datetime.timedelta(days=1)
row_dates=[x.strftime('%m/%d/%Y') for x in pd.bdate_range(start,end).tolist()]
#getting identifiers to be used on Eikon
csv_data=pd.read_csv('171128.csv', header=None)
identifiers=csv_data[0].tolist()
df=pd.DataFrame(index=identifiers, columns=row_dates)

https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe — BENY, Nov 30 '17 at 19:50
@Wen thanks! I'll save this for future references. My bad for not doing a deeper search. — Aquiles Páez, Nov 30 '17 at 20:27

score 1 · Accepted Answer · answered Nov 30 '17 at 19:44

1

You can use pd.crosstab:

pd.crosstab(df.Instrument, df['Date'],values=df['Total Return'], aggfunc='mean')

Output:

Date          2017-11-27T00:00:00Z  2017-11-28T00:00:00Z  2017-11-29T00:00:00Z
Instrument                                                                    
JP3843250006              0.200000              0.349301              0.348086
KYG2615B1014             -8.035714            -10.679612              0.000000

answered Nov 30 '17 at 19:44

Scott Boston

147,308
15
139
187

THANK YOU! This worked beautifully. Added this line at the end to change the column headers to the dates in the format I wanted them to be (see the small script at the end of my post to know what row_dates contains): df.columns=row_dates – Aquiles Páez Nov 30 '17 at 20:23
1

Great! I am glad I could help. – Scott Boston Nov 30 '17 at 20:23
1

@AquilesPáez [pd.crosstab](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.crosstab.html#pandas-crosstab) supports one function at a time. – Scott Boston Dec 04 '17 at 03:17
Can I make you an additional question, Scott? For instance, I want to transform dates from: 2017-11-27T00:00:00Z to just '11/27/2017'. That is: separate the date from the string and reformat it? I'm asking because this isn't a Datetime string, which would make it easier for me. But I'm clueless in this case. – Aquiles Páez Dec 04 '17 at 16:37
Forget it! I saw this post which helped me a lot, thanks anyways! :) https://stackoverflow.com/questions/41455967/convert-datetime-string-to-new-columns-of-day-month-year-in-pandas-data-frame – Aquiles Páez Dec 04 '17 at 17:29

score 1 · Answer 2 · answered Nov 30 '17 at 20:21

This looks like pandas.pivot_table() pivot_table to me, note you can add an agg function if you think there will be duplicates (from example looks like only one reading per day).

import pandas as pd
instrument=['KYG2615B1014','KYG2615B1014','KYG2615B1014', 'JP3843250006', 'JP3843250006', 'JP3843250006']
date=['11/29/2017', '11/28/2017', '11/27/2017', '11/29/2017', '11/28/2017', '11/27/2017']
total_return=[0.0, -10.679612, -8.035714, 0.348086, 0.349301, 0.200000]
stacked = pd.DataFrame(dict(Instrument=instrument, Date=date, Total_return=total_return)
pd.pivot_table(stacked, values='Total_return', index='Instrument', columns='Date')

This returns the following:

Date    11/27/2017  11/28/2017  11/29/2017
Instrument          
JP3843250006    0.200000    0.349301    0.348086
KYG2615B1014    -8.035714   -10.679612  0.000000

Python: reorganize a dataframe with repeated values appearing in one column.

2 Answers2