0

I have a pandas code, that is iterating tuples, I am trying to vectorize it.

list of tuples I am iterating if of this kind:

[('Morden', 35672, 'Morden Hall Park, Surrey'),
 ('Morden', 73995, 'Morden Hall Park, Surrey'),
 ('Newbridge', 120968, 'Newbridge, Midlothian'),
 ('Stroud', 127611, 'Stroud, Gloucestershire')]

the working tuples iterating code is:

for tuple_ in result_tuples:
    listing_looking_ins1.loc[:,'looking_in']\ 
    [(listing_looking_ins1.listing_id ==tuple_[1]) &
     (listing_looking_ins1.looking_in ==tuple_[0])] = tuple_[2]

I have tried to write a func to use with apply method, but it does not work:

result_tuples_df = pd.DataFrame(result_tuples)

def replace_ (row):
    row.loc[:,'looking_in'][(listing_looking_ins1.listing_id\ 
    \==result_tuples_df[1]) &
    (listing_looking_ins1.looking_in\==result_tuples_df[0])] \
     = result_tuples_df[2]

listing_looking_ins1.apply(replace_, axis=1)

Thank you!

Dmitriy Grankin
  • 568
  • 9
  • 21
  • What exactly is your expected output? Are you trying to get the last word from the third element in the tuples or what? – Samuel Nde Mar 27 '19 at 18:04

1 Answers1

1

You can convert your list of tuples to DataFrame and merge it with the original:

result_tuples_df = pd.DataFrame(result_tuples,
                                columns=['listing_id', 'looking_in', 'result'])

df = listing_looking_ins1.merge(result_tuples_df)

print(df)

Output:

  listing_id  looking_in                    result
0     Morden       35672  Morden Hall Park, Surrey
1     Morden       73995  Morden Hall Park, Surrey
2  Newbridge      120968     Newbridge, Midlothian
3     Stroud      127611   Stroud, Gloucestershire

And then if you want to have the result in the looking_in column:

df.drop('looking_in', 1).rename(columns={'result': 'looking_in'})

Output:

  listing_id                looking_in
0     Morden  Morden Hall Park, Surrey
1     Morden  Morden Hall Park, Surrey
2  Newbridge     Newbridge, Midlothian
3     Stroud   Stroud, Gloucestershire

P.S. In your code you're setting values with:

listing_looking_ins1.loc[:,'looking_in'][...] = ...

This is setting values on a copy of the DataFrame. Please refer to How to deal with SettingWithCopyWarning in Pandas? on why and how you should avoid doing this

P.P.S. Since you asked about vectorisation and using apply, you may also want to have a look at this answer https://stackoverflow.com/a/24871316/6792743 on performance of different operations

perl
  • 9,826
  • 1
  • 10
  • 22