0

I'm using Pandas to do some charting with Altair. Prior to passing the DataFrame to Altair, I want the option to resample the timeseries data. I have this working, but it creates a heirarchical dataframe object that Altair can't use, so I'm trying to flatten the data back to the original format. I've tried a whole bunch of things that seem like they almost fix this, but I can't quite get it right.

The initial data is a csv with a bunch of rows that contain term, score, and timestamp for news terms:

james comey,0.00,1524207600
congress,0.00,1524207600
meme,0.17,1524207600
video,0.38,1524207600
barbara bush,2.01,1524207600
trump,2.98,1524207600
...
james comey,0.00,1524211200
congress,0.00,1524211200
meme,0.17,1524211200
video,0.51,1524211200
barbara bush,2.01,1524211200

This is then parsed with pandas:

import pandas as pd
from datetime import datetime

def dateparse(timestamp):
    return datetime.fromtimestamp(int(timestamp))

data = pd.read_csv("data.csv",
                   parse_dates=[2],
                   date_parser=dateparse,
                   names=["term", "score", "timestamp"],
                   header=None)

From there we do the resample:

x = data.groupby(['term']).resample('24h', on='timestamp').mean()

This produces:

                               score
term            timestamp           
barbara bush    2018-04-20  2.499167
                2018-04-21  5.109167
                2018-04-22  4.030000
                2018-04-23  1.518333
                2018-04-24  1.120000
congress        2018-04-20  0.035000
                2018-04-21  0.005833
                2018-04-22  0.046667
                2018-04-23  0.028333
                2018-04-24  0.000000
...

Looking good so far. (Sort of? I think the score is the only column, but the data looks almost right.) Now the next thing I want is to rearrange this so it's back in the original format, something like:

term            timestamp   score   
barbara bush    2018-04-20  2.499167
barbara bush    2018-04-21  5.109167
barbara bush    2018-04-22  4.030000
barbara bush    2018-04-23  1.518333
barbara bush    2018-04-24  1.120000
congress        2018-04-20  0.035000
congress        2018-04-21  0.005833
congress        2018-04-22  0.046667

I've tried unstacking, melting, pivoting, swap_level/reorder_level (those looked almost good!) and damned near any other thing I can find in the documentation, but I'm not having much luck.

Thoughts?

ragona
  • 93
  • 1
  • 8

0 Answers0