List Comprehension & Speed Optimization

Question

I have a pandas dataframe, within the dataframe I have two series/columns that I wish to combine into a new series/column. I already have a for loop that does what I need but I'd rather it be in a list comprehension but I cannot figure it out. Also my code takes a considerable amount of time to execute. I read that list comprehensions run quicker, maybe there is a quicker way?

If the values from 'lead_owner' match the distinct/unique values from 'agent_final' use that value. Otherwise use the values from 'agent_final'

for x, y in zip(list(df['lead_owner']), list(df['agent_final'])):
    if x in set(df['agent_final']):
        my_list .append(x)
    else:
        my_list .append(y)

did you try `df['concatenated_col'] = df['lead_owner'] + df['agent_final']` — ksha, Sep 30 '19 at 12:41
looks like you want the intersection pluts the agent list. check this out: [SO Answer](https://stackoverflow.com/questions/18079563/finding-the-intersection-between-two-series-in-pandas) — lwileczek, Sep 30 '19 at 12:43
I don't want them concatenated. If the values from 'lead_owner' match the distinct/unique values from 'agent_final' use that value. Otherwise use the values from 'agent_final'. — Ryan Davies, Sep 30 '19 at 12:45

Nico Griffioen · Accepted Answer · 2019-09-30T12:53:46.647

2

The way to do this using list comprehension:

my_list = [x if x in set(df['agent_final']) else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]

It's pretty hard to say why your code is running slow, unless I know what the size of your data is.

One way to speed up your code for sure is to not construct the set every time you check if x is in the set. Construct the set outside of the for loop/ list comprehension:

agent_final_set = set(df['agent_final'])
my_list = [x if x in agent_final_set else y for (x,y) in zip(list(df['lead_owner']), list(df['agent_final']))]

edited Sep 30 '19 at 12:53

answered Sep 30 '19 at 12:43

Nico Griffioen

5,143
2
27
36

This worked thanks! Missing a round bracket at the end closing the zip() – Ryan Davies Sep 30 '19 at 12:53

score 1 · Answer 2 · answered Sep 30 '19 at 12:49

I removed some unnecessary code and extracted the creation of the set outside of the main loop. Let's see if this runs faster:

agents = set(df['agent_final'])
data = zip(df['lead_owner'], df['agent_final'])
result = [x if x in agents else y for x, y in data]

score 1 · Answer 3 · answered Sep 30 '19 at 12:58

1

I would suggest your try pandas apply and share performance :

agents = set(df['agent_final'])
df['result'] = df.apply(lambda x: x['lead_owner'] if x['lead_owner'] in agents else x['agent_final'], axis=1)

and do a to_list if required

answered Sep 30 '19 at 12:58

ksha

2,007
1
19
22

score 0 · Answer 4 · answered Sep 30 '19 at 12:51

With numpy.where one-liner:

my_list = np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)

Simple example:

In [284]: df
Out[284]: 
  lead_owner agent_final
0          a           1
1          b           2
2          c           a
3          e           c

In [285]: np.where(df.lead_owner.isin(df.agent_final), df.lead_owner, df.agent_final)
Out[285]: array(['a', '2', 'c', 'c'], dtype=object)

List Comprehension & Speed Optimization

4 Answers4