How to fill missing values based on group using pandas

Question

I am looking at order data. Each order comes in at multiple lines depending on how many different items are part of the order. The table looks like this:

+--------------+------------------+-------+
| order number | shipping address | item  |
+--------------+------------------+-------+
| A123         | Canada           | boots |
+--------------+------------------+-------+
| A123         | null             | socks |
+--------------+------------------+-------+
| A123         | null             | laces |
+--------------+------------------+-------+
| B456         | California       | shirt |
+--------------+------------------+-------+

How can I fill the null values with the actual shipping address, etc. for that order, in this case 'Canada'? (Using python + pandas ideally)

Do you have a separate table where you have all the addresses mapped to orders? In that case you can follow this instructions https://stackoverflow.com/questions/53645882/pandas-merging-101 — UpmostScarab, Feb 17 '23 at 20:53
Is it always the previous one, or you have to infer it from the `order number` column? — Ignatius Reilly, Feb 17 '23 at 20:56
@IgnatiusReilly it is usually the previous one, but that is not 100% guaranteed — user14461410, Feb 17 '23 at 20:58
If it's the previous one, you can use [fillna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) with `method='ffill`. If it's not guaranteed... you need a table for conversion as suggested in a previous comment. [This](https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary) may help (select first the rows for which there're no null values). — Ignatius Reilly, Feb 17 '23 at 21:01

score 2 · Accepted Answer · answered Feb 17 '23 at 21:03

2

You need a dictionary of order number as the key and shipping address as the value, Just drop the NULLs and create a dict which you can map to the shipping address column.

di = df[['order number', 'shipping addres']]
di = di[di['shipping addres'].notnull()]
di.set_index('order number', inplace=True)
di = di.to_dict('index')
df['shipping addres'] = df['order number'].map(di)

answered Feb 17 '23 at 21:03

darth baba

1,277
5
13

I think this makes the most sense. Basically going order the issue of not having separate tables for order info and item info, by creating them. – user14461410 Feb 17 '23 at 21:31

Jamiu S. · Answer 2 · 2023-03-01T03:46:09.707

0

This is an approach using df.groupby() follow by .ffill() and .bfill()

df['shipping address'] = df.groupby('order number')['shipping address'].ffill().bfill()
print(df)

  order number shipping address   item
0         A123           Canada  boots
1         A123           Canada  socks
2         A123           Canada  laces
3         B456       California  shirt

edited Mar 01 '23 at 03:46

answered Feb 17 '23 at 21:06

Jamiu S.

5,257
5
12
34

1

What if after the groupby, the first element in the shipping address is NULL? it will fill all NULLS – darth baba Feb 17 '23 at 21:10

How to fill missing values based on group using pandas

2 Answers2