Python - Updating values for different Array columns (Speed Improvement)

Question

I am trying to update a value to "True" using different column arrays for each row. I have an all False Dataframe:

    Place1 Place2 ... PlaceN
Id1 False  False  ... False
Id2 False  False  ... False
 .
 .
 .
IdN False  False  ... False

An a Series with a list of places for each Id:

Id1 [Place1, Place2]
Id2 [Place4, Place54, PlaceN]
 .
 .
 .
IdN [Place1]

What I need is to change the value in column Place1 and Place2 for Id1 in the DataFrame to True.

Currently I have a working code using a loop

for id, row in df.iterrows():
    df.loc[id]=row[series[id]]=True

But it is way too slow for over 60k ids and 150 places. I need it to do it in the time I read a news article instead of the hours that it is right now.

I have tried other methods such as apply but lambda functions do not allow for assignments.

Having a series a well structured series of the columns for each row it felt like there should be a vectorized way of indexing the array of columns. But I have not found it.

Thank you in advance for the help!

Thanks. Turns out my searching skills also need polishing. I appreciate the response! — Pablosky, Sep 12 '18 at 07:32

score 0 · Answer 1 · answered Sep 11 '18 at 16:45

You can use the sklearn library and feed your series directly. Here's a demo:

from sklearn.preprocessing import MultiLabelBinarizer

s = pd.Series([['Place1', 'Place2'], ['Place1', 'Place2', 'Place3'], ['Place2']],
              index=['Id1', 'Id2', 'Id3'])

mlb = MultiLabelBinarizer()

res = pd.DataFrame(mlb.fit_transform(s),
                   columns=mlb.classes_,
                   index=s.index).astype(bool)

Result:

    Place1 Place2 Place3
Id1   True   True  False
Id2   True   True   True
Id3  False   True  False

This will be more efficient than filling an existing dataframe via manual iteration.

Python - Updating values for different Array columns (Speed Improvement)

1 Answers1