Segregating the data in python in multiple column instead of repeating in row

Question

I have data as follows:

Events  Days
Event1   1
Event2   2
Event3   3
Event4   3
Event1   1
Event1   2
Event5   4
Event3   4
Event4   1

I need following format

Event1  1   1   2
Event2  2   0   0
Event3  3   4   0
Event4  3   1   0
Event5  4   0   0

i.e unique values and it s day count in Diffreent columns. How do i do this in python?

Willem Van Onsem yes this is pandas , please can you help how do i start? — Cyley Simon, Jan 19 '18 at 11:17
You have to use small caps, cᴏʟᴅsᴘᴇᴇᴅ, not COLDSPEED. Also, you can type "@" and tab complete the name, otherwise the user does not get pinged. Like this, @WillemVanOnsem — cs95, Jan 19 '18 at 11:20
@WillemVanOnsem Sorry, my intention was to show them how to tag users, using you as an example. My mistake! — cs95, Jan 19 '18 at 11:53

score 0 · Answer 1 · answered Jan 19 '18 at 11:35

You can use cumcount for Counter for new columns names created by set_index + unstack or pandas.pivot, but is necessary replace NaNs by fillna and last cast to ints:

g = df.groupby('Events').cumcount()
df = df.set_index(['Events',g])['Days'].unstack(fill_value=0)
print (df)
        0  1  2
Events         
Event1  1  1  2
Event2  2  0  0
Event3  3  4  0
Event4  3  1  0
Event5  4  0  0

Alternative:

df = pd.pivot(index=df['Events'], 
              columns=df.groupby('Events').cumcount(), 
              values=df['Days']).fillna(0).astype(int)
print (df)

        0  1  2
Events         
Event1  1  1  2
Event2  2  0  0
Event3  3  4  0
Event4  3  1  0
Event5  4  0  0

score 0 · Answer 2 · answered Jan 19 '18 at 11:40

In addition to other answers and for the common enrichment - alternative solution, to make it feasible in non-pandas way:

with csv and collections modules:

import csv, collections

with open('data.txt') as f:
    reader = csv.DictReader(f, delimiter=' ', skipinitialspace=True)
    groups = collections.defaultdict(list)

    for r in reader:
        groups[r['Events']].append(str(r['Days']))

    max_len = len(max(groups.values(), key=len))
    for k, v in sorted(groups.items()):
        v += ['0'] * (max_len - len(v))
        print(k, '\t'.join(v), sep='\t')

The output:

Event1  1   1   2
Event2  2   0   0
Event3  3   4   0
Event4  3   1   0
Event5  4   0   0

Segregating the data in python in multiple column instead of repeating in row

2 Answers2