Create a new column in pd dataframe using a formula

Question

let me rephrase my question:

I have the following dataset:

data = {
    'globalId': {0: 4388064, 1: 4388200, 2: 4399344, 3: 4400638, 4: 4401765, 5: 4401831},
    'publicatieDatum': {0: '2018-07-31', 1: '2018-09-24', 2: '2018-08-02', 3: '2018-08-04', 4: '2018-08-05', 5: '2018-08-06'}, 
    'postcode': {0: '1774PG', 1: '7481LK', 2: '1068MS', 3: '5628EN', 4: '7731TV', 5: '5971CR'},
    'koopPrijs': {0: 139000.0, 1: 209000.0, 2: 267500.0, 3: 349000.0, 4: 495000.0, 5: 162500.0}
}

df = pd.DataFrame(data)
print(df)

This is the code for the municipality name of the zipcode/postal code of the 1st row:

nomi.query_postal_code(["1774"])

Now, I want to add a column called 'Gemeente'.

This can be retreived using the following formule:

>>> nomi.query_postal_code(["postcode"])

The postcode above should indicate the 4 numbers of the postcode within the postcode column.

I have 2 questions:

How can i add a code that calculates the gemeente for all rows in the above dataframe, based on the 'postcode', as specified above.
How can this code be written that it only selects the first 4 digits in the postcode column.

Apologies and thanks!

Welcome to Stack Overflow. Could you please provide a Minimal reproducible example? https://stackoverflow.com/help/minimal-reproducible-example — intedgar, Nov 17 '21 at 14:50
Welcome, which column you have(show us example of values in the column) and what is the expected output? — Eitan Rosati, Nov 17 '21 at 14:51
It would be helpful if you share your data of municipality name, so that we can provide a way as you want. — Park, Nov 17 '21 at 14:51
And please don't share links to images or images but the data in a table format itself. Because otherwise it is not reproducible. — intedgar, Nov 17 '21 at 14:56
I added 2 pictures. The first snapshot indicates the zipcode per row, which is the input for the formula: >>> nomi.query_postal_code(["1774"]) . The second snapshot is the document from which the municipality name can be retrieved, based on the zip code code, which is processed in the nomi.query formula. — , Nov 17 '21 at 14:57
[Please do not post images of code, data, error messages, etc.](https://stackoverflow.com/help/how-to-ask) — Timus, Nov 17 '21 at 14:59
globalId publicatieDatum postcode koopPrijs volledigeOmschrijving soortWoning categorieObject bouwjaar indTuin perceelOppervlakte kantoor_naam_MD5hash aantalKamers aantalBadkamers energielabelKlasse globalId.1 oppervlakte datum_ondertekening 0 4388064 2018-07-31 1774PG 139000.0 Ruimte, vrijheid, en altijd het water en de we... <{woonboot}> <{vrijstaande woning}> <{Woonhuis}> 1971-1980 1 NaN 09F114F5C5EC061F6230349892132149 3 NaN NaN 4388064 62 2018-11-12 — , Nov 17 '21 at 15:01
Apollogies, how can I show the table without making a screenshot? — , Nov 17 '21 at 15:01
[How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/14311263). — Timus, Nov 17 '21 at 15:03
{'globalId': {}, 'publicatieDatum': {}, 'postcode': {}, 'koopPrijs': {}, 'volledigeOmschrijving': {}, 'soortWoning': {}, 'categorieObject': {}, 'bouwjaar': {}, 'indTuin': {}, 'perceelOppervlakte': {}, 'kantoor_naam_MD5hash': {}, 'aantalKamers': {}, 'aantalBadkamers': {}, 'energielabelKlasse': {}, 'globalId.1': {}, 'oppervlakte': {}, 'datum_ondertekening': {}} — , Nov 17 '21 at 15:22
I hope this satisfies criteria of a minimal reproducible example: {'globalId': 4388064, 'publicatieDatum': '2018-07-31', 'postcode': '1774PG', 'koopPrijs': 139000.0, 'volledigeOmschrijving': 'nrs': 3, 'aantalBadkamers': nan, 'energielabelKlasse': nan, 'globalId.1': 4388064, 'oppervlakte': 62, 'datum_ondertekening': '2018-11-12'} — , Nov 17 '21 at 15:36
Yes. Basically postcode is the Dutch translation of zipcode. Meaning that by filling in 4 digits of a postcode/zipcode, the municipality name should be retreived. Ill try and see if i can add the code for the 1st row in the code above. — , Nov 17 '21 at 17:20

score 0 · Answer 1 · answered Nov 17 '21 at 14:58

0

nomi['is_1774']=[v == '1774' for v in nomi['postal_code']]

answered Nov 17 '21 at 14:58

prof_FL

156
9

I get the following error: TypeError: 'Nominatim' object is not subscriptable – Nov 17 '21 at 15:07
Thank you for contributing an answer. Would you kindly edit your answer to include an explanation of your code? That will help future readers better understand what is going on, and especially those members of the community who are new to the language and struggling to understand the concepts. – Jeremy Caney Nov 18 '21 at 00:52

Timus · Accepted Answer · 2021-11-17T19:01:26.217

0

I'm assuming that you are working with the pgeocode library?

You could try:

import pandas as pd
import pgeocode

nomi = pgeocode.Nominatim('nl')

data = {
    'globalId': {0: 4388064, 1: 4388200, 2: 4399344, 3: 4400638, 4: 4401765, 5: 4401831},
    'publicatieDatum': {0: '2018-07-31', 1: '2018-09-24', 2: '2018-08-02', 3: '2018-08-04', 4: '2018-08-05', 5: '2018-08-06'}, 
    'postcode': {0: '1774PG', 1: '7481LK', 2: '1068MS', 3: '5628EN', 4: '7731TV', 5: '5971CR'},
    'koopPrijs': {0: 139000.0, 1: 209000.0, 2: 267500.0, 3: 349000.0, 4: 495000.0, 5: 162500.0}
}
df = pd.DataFrame(data)

df["postal_code"] = df.postcode.str[:4]
df = df.merge(
    nomi.query_postal_code(set(df.postal_code))[["postal_code", "state_name"]],
    on="postal_code", how="left"
)

Result:

   globalId publicatieDatum postcode  koopPrijs postal_code     state_name
0   4388064      2018-07-31   1774PG   139000.0        1774  Noord-Holland
1   4388200      2018-09-24   7481LK   209000.0        7481     Overijssel
2   4399344      2018-08-02   1068MS   267500.0        1068  Noord-Holland
3   4400638      2018-08-04   5628EN   349000.0        5628  Noord-Brabant
4   4401765      2018-08-05   7731TV   495000.0        7731     Overijssel
5   4401831      2018-08-06   5971CR   162500.0        5971        Limburg

Seems to be the better way. What field do you need?

Or do:

import pandas as pd
import pgeocode

nomi = pgeocode.Nominatim('nl')

data = ... as above ...
df = pd.DataFrame(data)

df["postal_code"] = df.postcode.str[:4]
df = df.merge(
    nomi.query_postal_code(set(df.postal_code)), on="postal_code", how="left"
)

Result:

   globalId publicatieDatum postcode  ...  latitude longitude accuracy
0   4388064      2018-07-31   1774PG  ...   52.8472    4.9702      6.0
1   4388200      2018-09-24   7481LK  ...   52.1536    6.7528      6.0
2   4399344      2018-08-02   1068MS  ...   52.3592    4.8052      6.0
3   4400638      2018-08-04   5628EN  ...   51.4801    5.4724      6.0
4   4401765      2018-08-05   7731TV  ...   52.5240    6.4208      6.0
5   4401831      2018-08-06   5971CR  ...   51.4198    6.1408      6.0

[6 rows x 16 columns]

Then just drop the columns you don't need.

edited Nov 17 '21 at 19:01

answered Nov 17 '21 at 17:28

Timus

10,974
5
14
28

Yes, this does not work unfortunately. nomi.query_postal_code(["postcode"]), in which postcode is the zipcode, should return the corresponding municipality name to which the zipcode belongs. – Nov 17 '21 at 17:43
Yes, apologies, you are completely right. It takes the zipcode as an integer. In the first row, it takes the zipcode 1774PG only as integer 1774. Therefore, I need to find out how to remove the 'PG' / characters from this zip code in all the columns in the dataframe. – Nov 17 '21 at 17:57
Thanks very much @Timus. Unfortunately I get the following error :AttributeError: 'ellipsis' object has no attribute 'postcode' – Nov 17 '21 at 18:30
thank you. Sorry for asking these questions, but by what do i have to replace it? I also get this error: 'str' object has no attribute 'postcode' – Nov 17 '21 at 18:41
Nevermind i got it! Had to restart Python! 1000x THX @Timus – Nov 17 '21 at 18:45
1 more thing, i need the county/municipality name, instead of the state name. How can this be adjusted? – Nov 17 '21 at 18:48
nevermind, also got this. thx ! :) – Nov 17 '21 at 18:49
In case that I also want to add the state name to this dataframe, so besides the place_name, also the state_name. How can I best adjust the code? – Nov 22 '21 at 12:59
@Timskouten I think replacing `nomi.query_postal_code(set(df.postal_code))[["postal_code", "state_name"]]` with `nomi.query_postal_code(set(df.postal_code))[["postal_code", "place_name", "state_name"]]` in the merge should do it? – Timus Nov 22 '21 at 13:09

Create a new column in pd dataframe using a formula

2 Answers2