Transform pandas Series into boolean matrix

Question

I have the following pd.Series object.

              ticker
date                
2020-02-06     BKBR3
2020-02-06     BRKM5
2020-02-06     CNTO3
..........    ......
2020-02-06    BIDI11
2020-02-06     BRPR3
2020-02-06     CVCB3
2020-02-06     ELET3
2020-02-06     GFSA3
2022-01-26    QETH11
2022-01-26     ABEV3
2022-01-26    BIDI11
2022-01-26     CRFB3
2022-01-26     LCAM3

And I want to turn it into a boolean matrix (not sure if I'm using the appropriate terminology), that looks like the table below. It has the same index values from the original table, but the ticker column is pivoted and the values are True when the column and index are in the original table and False when they aren't.

            BKBR3  BRKM5  CNTO3  .....
date                           
2020-02-06  True   False  False  .....
2020-02-10  False  False  False  .....
..........  .....  .....  .....

Does anyone know a clever way of doing it? I tried pivoting the table, but since it has no values, it didn't work.

score 1 · Answer 1 · answered Jan 26 '22 at 16:48

1

Managed to do it with the pivot method after some more trying.

df["bool"] = 1
df = df.groupby(["date", "ticker"]).sum()
df = df.reset_index()
df = df.pivot(index="date", columns="ticker", values="bool")
df = df.fillna(0)

answered Jan 26 '22 at 16:48

pelelter

518
4
14

piterbarg · Accepted Answer · 2022-01-26T17:34:18.353

1

The output your provided does not look like a series, it looks like a dataframe with index 'date' and one column 'value'. I assume this is what it is -- it is easy to go between the two obviously so you may need to make small changes to the below

anyway here is how you can then do it -- by making it a multi-index df that you then unstack:

(df.set_index(['ticker'], append=True)
    .assign(v = True)
    .unstack(level=1)
    .fillna(False)
)

output (for a short version you had the output for in your question)

    v
ticker      ABEV3   BIDI11  BKBR3   BRKM5   BRPR3   CNTO3   CRFB3   CVCB3   ELET3   GFSA3   LCAM3   QETH11
date                                                
2020-02-06  False   True    True    True    True    True    False   True    True    True    False   False
2022-01-26  True    True    False   False   False   False   True    False   False   False   True    True

Edit

If there is duplicate data (same ticker for the same date used more than once) than one can use groupby instead of set_index:

(df.reset_index()
    .assign(v = True)
    .groupby(['date','ticker']).first()
    .unstack(level=1)
    .fillna(False)
)

edited Jan 26 '22 at 17:34

answered Jan 26 '22 at 16:49

piterbarg

8,089
2
6
22

I tried your solution, but it raised `ValueError: Index contains duplicate entries, cannot reshape` when running the `unstack` method. – pelelter Jan 26 '22 at 16:59
I'm using version `1.3.5` of pandas – pelelter Jan 26 '22 at 17:02
I found the problem reading the comment on [this answer](https://stackoverflow.com/a/28652153/13574635). I had a duplicate index on my data. Thanks! – pelelter Jan 26 '22 at 17:09
@pelelter Glad you figured it out; I edited the answer to deal with duplicate indices if you legitimately need them – piterbarg Jan 26 '22 at 17:35

Transform pandas Series into boolean matrix

2 Answers2

Edit