0

I'm really unsure about how to express my problem and thus phrase my questions. Surely this is a widely known issue, but I couldn't find how to solve it.

I have a dataframe like this

     id        time  value
0   100  2012-04-01    2.3
1   100  2012-04-02    3.4
2   100  2012-04-03    1.4
3   100  2012-04-04    5.6
4   200  2012-04-01    3.0
5   200  2012-04-02    2.9
6   200  2012-04-03    2.8
7   200  2012-04-04    1.9
8   300  2012-04-01    3.1
9   300  2012-04-02    4.2
10  300  2012-04-03    2.5
11  300  2012-04-04    1.2

Where I have a list of unique ids that define some time series. In this case I have 3 time series, i.e. timeseries 100, timeseries 200, timeseries 300. Each of these timeseries has got one value for each date. (Notice that they all start at 2012-04-01 and all end at 2012-04-04 so they are all over the same period).

What I want to do is have a fast way to transform the above into something like

        time      100    200    300
0   2012-04-01    2.3    3.0    3.1
1   2012-04-02    3.4    2.9    4.2
2   2012-04-03    1.4    2.8    2.5
3   2012-04-04    5.6    1.9    1.2

Or even better with the time as index

    time      100    200    300
2012-04-01    2.3    3.0    3.1
2012-04-02    3.4    2.9    4.2
2012-04-03    1.4    2.8    2.5
2012-04-04    5.6    1.9    1.2

MY TRY

given that the dataframe is called df.

list_dfs = []
for i in somedf['id'].unique():
    list_dfs.append(somedf[somedf['id'] == i].set_index('time').drop('id', axis = 1).rename(columns={'value':i}))

By doing this then I can just use pd.concat, but the problem with that is that it messes up the index

pd.concat(list_dfs)

and I get

            100  200  300
time                     
2012-04-01  2.3  NaN  NaN
2012-04-02  3.4  NaN  NaN
2012-04-03  1.4  NaN  NaN
2012-04-04  5.6  NaN  NaN
2012-04-01  NaN  3.0  NaN
2012-04-02  NaN  2.9  NaN
2012-04-03  NaN  2.8  NaN
2012-04-04  NaN  1.9  NaN
2012-04-01  NaN  NaN  3.1
2012-04-02  NaN  NaN  4.2
2012-04-03  NaN  NaN  2.5
2012-04-04  NaN  NaN  1.2

Code to generate the df for the MWE

df = pd.DataFrame({'id':[100,100,100,100,200,200,200,200,300,300,300,300], 'time':['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04','2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04','2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04'], 'value':[2.3,3.4, 1.4, 5.6, 3, 2.9, 2.8, 1.9, 3.1, 4.2,2.5, 1.2]})

Ideas

Notice that I don't necessarily need all that fancy staff. All I want is being able to have a matrix where I can have the time series side by side.

I think there must be a way using pandas groupby but I cannot seem to find it.

Euler_Salter
  • 3,271
  • 8
  • 33
  • 74

0 Answers0