0

I am trying this problem but not getting the right solution.

So, I have a data which has City and Months mapped to them

City Month
A M1
A M2
B M3
B M4
C M5
C M8

I have created dummy variables and have marked them as binary in this manner

City M1 M2 M3 M4 M5 M8
A 1 0 0 0 0 0
A 0 1 0 0 0 0
B 0 0 1 0 0 0
B 0 0 0 0 0 1
C 0 0 0 1 0 0
C 0 0 0 0 1 0

Now, the main problem is, I want to mark each location to a month in a single row, like this

City M1 M2 M3 M4 M5 M8
A 1 1 0 0 0 0
B 0 0 1 0 0 1
C 0 0 0 1 1 0

Can anyone suggest how to move from table 2 to table 3 structure? I do not want to hard code them as different locations might get assigned random months in subsequent data. Getting dummy variables in easy but how do I get to last format? any useful functions existing in python for this?

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • 1
    What data type are you using for the above tables? Can you share please your code? – A. Maman Nov 16 '21 at 11:41
  • I think pivot_table with `len` not return indicator columns, but counts. Only if no duplicates get `0,1`, else 0,1,2,... counts. – jezrael Nov 16 '21 at 11:58

1 Answers1

0

I think this can be solved by using pivot_table(). The trick here is to use len as aggfunc.

df.pivot_table(index='City',columns='Month',aggfunc=len,fill_value=0).clip(1,0)

Outputs:

Month  M1  M2  M3  M4  M5  M8
City                         
A       1   1   0   0   0   0
B       0   0   1   1   0   0
C       0   0   0   0   1   1
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53