Split dataframe on unique value and add as new column merged on index

Question

Usually I am able to solve my problems from browsing SO. However, I am stuck here.

Assume I have a dataframe looking like this:

import itertools
lst = range(1,3)
group_names = ["GROUP1","GROUP2","GROUP3"]

rng = pd.date_range('1/1/2011', periods=24, freq='H')
time_index = pd.Series(list(itertools.chain.from_iterable(itertools.repeat(rng, 3))), name="Time")
values = pd.Series(np.random.randn(len(time_index)), index=time_index, name="Values")
groups = pd.Series(list(itertools.chain.from_iterable(itertools.repeat(x, 24) for x in group_names)), index=time_index,  name="Groups")
df = pd.concat([values, groups], axis=1)

The dataframe should like this:

                       Values  Groups
Time                                 
2011-01-01 00:00:00  0.339785  GROUP1
2011-01-01 01:00:00  0.384126  GROUP1
2011-01-01 02:00:00 -0.096264  GROUP1
...
2011-01-01 01:00:00  1.448881  GROUP2
2011-01-01 02:00:00  1.448881  GROUP2
2011-01-01 03:00:00  1.448881  GROUP2
2011-01-01 04:00:00 -0.256500  GROUP2
2011-01-01 05:00:00 -1.818228  GROUP2

Note that the Time Index is 24 hours, but always for the same day. I want to achieve that each unique group in the "Groups" column will be inserted as a new column with the respective value for the "Values" column. All columns now are merged on the same Time Index.

The aspired outcome:

df2 = pd.DataFrame(np.random.randn(24, 3), columns=group_names, index=rng)

                       GROUP1    GROUP2    GROUP3
2011-01-01 00:00:00 -0.144325 -0.883371  2.245540
2011-01-01 01:00:00 -0.311808 -0.854769  0.229355
2011-01-01 02:00:00  0.983453 -1.741552  2.409349
2011-01-01 03:00:00  0.303125 -1.363857  0.484512
...
2011-01-01 19:00:00  1.129852  1.374176 -0.639436
2011-01-01 20:00:00 -0.793817 -1.834729 -0.089721
2011-01-01 21:00:00  1.389437 -1.198244 -0.430752
2011-01-01 22:00:00  0.321585 -0.107011 -0.083659
2011-01-01 23:00:00  0.886643 -0.390729 -2.380853

So far I have tried couple of attemps, starting from groupby to a simple loop.

for group in df.Groups.unique():
    df[group ] = df[(df2.Groups== group )]["Value"]

However, this results in NaN for all the duplicate indices.

I hope my explanation is clear enough.

Anyone an idea?

I appreciate the pains you've taken to provide a MWE, but this question has been asked before in various shapes and forms. The answer is usually pivoting the dataframe. — cs95, Dec 19 '17 at 19:22
Not entirely impossible, If you knew what "pivot"ing a dataframe was, you wouldn't need to ask the question in the first place :-) — cs95, Dec 19 '17 at 19:42

Split dataframe on unique value and add as new column merged on index

0 Answers0