How to use condition to fill data?

Question

I have such problem: DataFrame for this:http://sigmaquality.pl/wp-content/uploads/2019/03/sample.csv

I have two columns: postal code and code of country. I have many null cells in column: code of country.

I know if postal code has mask XX-XXX this is Polish code. Because I know it, I can fill empty cells by symbol: 'PL'. I don't know how to do it.

How to use condition to fill data?

anky · Answer 1 · 2019-03-25T16:08:23.237

5

Use groupby and ffill() with bfill():

df.groupby('POSTAL_CD').apply(lambda x: x.ffill().bfill())

   Unnamed: 0 POSTAL_CD COUNTRY
0         0.0    33-101      PL
1         1.0    277 32      CZ
2         2.0    72-010      PL
3         3.0    33-101      PL
4         4.0      7700      BE
5         5.0    72-010      PL
6         6.0    33-101      PL
7         7.0     10095      IT
8         8.0    33-101      PL
9         9.0    33-101      PL

edited Mar 25 '19 at 16:08

answered Mar 25 '19 at 16:00

anky

74,114
11
41
70

1

If you mean about bfill and ffill chain , you need apply here :-) – BENY Mar 25 '19 at 16:03
1

Looks better now :-) – BENY Mar 25 '19 at 16:09

BENY · Accepted Answer · 2019-03-25T16:19:01.813

5

Check with np.where with str.contains

df['COUNTRY']=np.where(df['POSTAL_CD'].str.match(r'\d{2}-\d{3}')&df['COUNTRY'].isnull(),'PL',df['COUNTRY'])

edited Mar 25 '19 at 16:19

answered Mar 25 '19 at 16:01

BENY

317,841
20
164
234

score 3 · Answer 3 · answered Mar 25 '19 at 16:11

3

How about using the loc indexer as shown here.

df = pd.read_csv("sample.csv", sep=",", index_col=0)
df.loc[df["POSTAL_CD"].str.contains("-", na=False), "COUNTRY"] = "PL"

answered Mar 25 '19 at 16:11

jsgounot

709
6
12

score 2 · Answer 4 · answered Mar 25 '19 at 16:11

When I wrote this code, I considered that you need a mask with [two digits]-[three digits] for postal codes, not just having a dash inside or non-empty field.

import re
import csv

# Compile our regexp
regexp = re.compile(r'[0-9]{2}-[0-9]{3}')

# Read the CSV and load it into memory
reader = csv.DictReader(open('sample.csv'))
table = list(reader)

# Iterate for rows
for row in table:
    # Check if the postal code is fit to our regexp
    if regexp.match(row['POSTAL_CD']):
        row['COUNTRY'] = 'PL'

# Write the result
with open('result.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=['', 'POSTAL_CD', 'COUNTRY'])
    writer.writeheader()
    writer.writerows(table)

score 1 · Answer 5 · answered Jul 26 '20 at 06:34

1

After a while I learned a little and I would do this:

df['Nowa'] = df['POSTAL_CD'].str.slice(2,3)
df['Nowa'] = df['Nowa'].apply(lambda x: 'PL' if x == '-' else np.nan)
df['COUNTRY'].fillna(df['Nowa'], inplace=True) 
del df['Nowa']
df

answered Jul 26 '20 at 06:34

Wojciech Moszczyński

2,893
21
27

How to use condition to fill data?

5 Answers5