1

how to count items in each month with fill?

data = [
        {"event_date": "2018-08-10", "tags": ["tv", "radio"]},
        {"event_date": "2018-08-11", "tags": ["tv", "radio"]},
        {"event_date": "2018-09-10", "tags": ["tv"]},
        {"event_date": "2018-11-10", "tags": ["tv", "wifi"]}
    ] 
df = pd.DataFrame(data)
df.groupby([(df['event_date']).dt.month, df['tags']]).count()

what I expect:

month tv radio wifi
8 2 2 0
9 1 0 0
10 0 0 0
11 1 0 1
sbha
  • 9,802
  • 2
  • 74
  • 62
CodeNinja
  • 1,168
  • 2
  • 14
  • 27

2 Answers2

5

So this is unnest ,get_dummies and reindex mixed question

df.event_date=pd.to_datetime(df.event_date).dt.month

l=list(range(df.event_date.min(),df.event_date.max()+1))
df.set_index('event_date').tags.apply(pd.Series).stack().\
  str.get_dummies().sum(level=0).\
   reindex(l,fill_value=0)
Out[834]: 
            radio  tv  wifi
event_date                 
8               2   2     0
9               0   1     0
10              0   0     0
11              0   1     1
BENY
  • 317,841
  • 20
  • 164
  • 234
3

A similar method to @Wen, but creating a new DataFrame to avoid the apply:

s = (pd.DataFrame(df.tags.values.tolist(),
        index=df.event_date.dt.month).stack().reset_index(1, drop=True))

Now using pd.get_dummies:

(pd.get_dummies(s).sum(level=0)
    .reindex(np.arange(s.index.min(),s.index.max()+1), fill_value=0))

            radio  tv  wifi
event_date
8               2   2     0
9               0   1     0
10              0   0     0
11              0   1     1
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • 1
    I feel like I am too lazy ( writing done repeat) :-) , BTW Nice one – BENY Oct 03 '18 at 21:27
  • 1
    Mine could be much cleaner, I always forget you can sum across levels :P I'm also stealing your `fill_value` – user3483203 Oct 03 '18 at 21:28
  • Any idea how the performance of something like `df.tags.str.join('*').str.get_dummies(sep='*')` to get the dummies holds up? I'm always so clueless about the `.str` methods. – ALollz Oct 03 '18 at 21:30
  • Hmm, I'd have to try it out. I'd imagine it could be faster than expanding the lists to a new series, but I don't think it would beat creating the new DataFrame. String operations are unfortunately slow in pandas. – user3483203 Oct 03 '18 at 21:34
  • Yeah it gets pretty slow actually, your method is pretty fast :D – ALollz Oct 03 '18 at 21:34