-2

First of all, hi Stackoverflow.. I'm new here and will try to write a question to the best of my abilities.

I have the following DF:

**Adress                                       Postalcode      City
0   Hammer Landstraße 91 41460 Neuss             41460         Neuss
1   Grebbeberglaan 15 unit 5 B 3527VX Utrecht   3527VX         Utrecht
2   Brink 88b 7411 BX, Deventer Nederland      7411 BX         Deventer
3   Flevolaan 58 Postbus 399 1380 AJ Weesp     1380 AJ         Weesp

I'm trying to filter the Adress column based on the Postalcode and City values! I just don't know how :(

My desired output will then be:

**Adress                                       Postalcode      City
0   Hammer Landstraße 91                         41460         Neuss
1   Grebbeberglaan 15 unit 5 B                  3527VX         Utrecht
2   Brink 88b Nederland                        7411 BX         Deventer
3   Flevolaan 58 Postbus 399                   1380 AJ         Weesp

I've tried the following:

df1['Adress'] = filter(lambda i: not regex.search(df1[Postalcode]), df1[str('Adress')]). 

So I want to keep all rows but update (read: remove) the values of the Adress column based on the values of the other two columns.

However this returns filter objects of df1 Adress column. Can any of you guys help me?

Any tips for me to improve asking questions are always welcome!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Max
  • 15
  • 5
  • what's the logic that you want to use for the filtering? find rows where the postalcode is also in the address? – Yuca Dec 21 '20 at 15:54
  • Welcome Max, please see the link and adapt to your actual use case. It seems you want to filter some specific postal code (you can also use `df['column_name'].isin(PC_list)` if you want several different items). Once you have that filter you can select any column you need. – RichieV Dec 21 '20 at 15:59
  • I want to update the values of Adress column, essentially removing the values listed in the other two columns from the Adress column – Max Dec 21 '20 at 16:06
  • So I want to keep all rows but update (read: remove) the values of the Adress column based on the values of the other two columns – Max Dec 21 '20 at 16:07

1 Answers1

0

Are you looking for something like this:

import pandas as pd

data = [['Hammer Landstraße 91 41460 Neuss', "41460", "Neuss"], ['Brink 88b 7411 BX, Deventer Nederland', "7411 BX", "Deventer"]] 

df = pd.DataFrame(data, columns = ['Address', 'Postalcode', 'City']) 

df['Address'] = df.apply(lambda x: x['Address'].replace(x['Postalcode'], ''), axis=1)
df['Address'] = df.apply(lambda x: x['Address'].replace(x['City'], ''), axis=1)
df['Address'] = df.apply(lambda x: x['Address'].replace(', ', ''), axis=1)

The last is just to remove commas, can be merged as one-liner but this is better for visibility

The output:

                  Address Postalcode      City
0  Hammer Landstraße 91        41460     Neuss
1    Brink 88b  Nederland    7411 BX  Deventer
Ruli
  • 2,592
  • 12
  • 30
  • 40