1

I want to convert a dataframe like:

id  event_type count

1      "a"       3
1      "b"       5
2      "a"       1
3      "b"       2

into a dataframe like:

id    a    b   a > b
1     3    5     0
2     1    0     1
3     0    2     0

Without using for-loops. What's a proper pythonic (Pandas-tonic?) way of doing this?

andandandand
  • 21,946
  • 60
  • 170
  • 271

2 Answers2

4

Well, not sure if this is exactly what you need or if it has to be more flexible than this. However, this would be one way to do it - assuming missing values can be replaced by 0.

import pandas as pd
from io import StringIO

# Creating and reading the data

data = """
id  event_type count
1      "a"       3
1      "b"       5
2      "a"       1
3      "b"       2
"""
df = pd.read_csv(StringIO(data), sep='\s+')

# Transforming

df_ = pd.pivot_table(df, index='id', values='count', columns='event_type') \
        .fillna(0).astype(int)
df_['a > b'] = (df_['a'] > df_['b']).astype(int)

Where df_ will take the form:

event_type  a  b  a > b
id                     
1           3  5      0
2           1  0      1
3           0  2      0
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
1

This can be split up into two parts.

Solution

df.set_index(
    [‘id’, ‘event_type’]
)[‘count’].unstack(
    fill_value=0
).assign(**{
    ‘a < b’: lambda d: d.eval(‘a < b’)
})
piRSquared
  • 285,575
  • 57
  • 475
  • 624