0

I have data as follows:

Events  Days
Event1   1
Event2   2
Event3   3
Event4   3
Event1   1
Event1   2
Event5   4
Event3   4
Event4   1

I need following format

Event1  1   1   2
Event2  2   0   0
Event3  3   4   0
Event4  3   1   0
Event5  4   0   0

i.e unique values and it s day count in Diffreent columns. How do i do this in python?

cs95
  • 379,657
  • 97
  • 704
  • 746
Cyley Simon
  • 253
  • 2
  • 5
  • 17

2 Answers2

0

You can use cumcount for Counter for new columns names created by set_index + unstack or pandas.pivot, but is necessary replace NaNs by fillna and last cast to ints:

g = df.groupby('Events').cumcount()
df = df.set_index(['Events',g])['Days'].unstack(fill_value=0)
print (df)
        0  1  2
Events         
Event1  1  1  2
Event2  2  0  0
Event3  3  4  0
Event4  3  1  0
Event5  4  0  0

Alternative:

df = pd.pivot(index=df['Events'], 
              columns=df.groupby('Events').cumcount(), 
              values=df['Days']).fillna(0).astype(int)
print (df)

        0  1  2
Events         
Event1  1  1  2
Event2  2  0  0
Event3  3  4  0
Event4  3  1  0
Event5  4  0  0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

In addition to other answers and for the common enrichment - alternative solution, to make it feasible in non-pandas way:

with csv and collections modules:

import csv, collections

with open('data.txt') as f:
    reader = csv.DictReader(f, delimiter=' ', skipinitialspace=True)
    groups = collections.defaultdict(list)

    for r in reader:
        groups[r['Events']].append(str(r['Days']))

    max_len = len(max(groups.values(), key=len))
    for k, v in sorted(groups.items()):
        v += ['0'] * (max_len - len(v))
        print(k, '\t'.join(v), sep='\t')

The output:

Event1  1   1   2
Event2  2   0   0
Event3  3   4   0
Event4  3   1   0
Event5  4   0   0
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105