0

I am looking at order data. Each order comes in at multiple lines depending on how many different items are part of the order. The table looks like this:

+--------------+------------------+-------+
| order number | shipping address | item  |
+--------------+------------------+-------+
| A123         | Canada           | boots |
+--------------+------------------+-------+
| A123         | null             | socks |
+--------------+------------------+-------+
| A123         | null             | laces |
+--------------+------------------+-------+
| B456         | California       | shirt |
+--------------+------------------+-------+

How can I fill the null values with the actual shipping address, etc. for that order, in this case 'Canada'? (Using python + pandas ideally)

Jamiu S.
  • 5,257
  • 5
  • 12
  • 34
  • Do you have a separate table where you have all the addresses mapped to orders? In that case you can follow this instructions https://stackoverflow.com/questions/53645882/pandas-merging-101 – UpmostScarab Feb 17 '23 at 20:53
  • @UpmostScarab no, all I have is this one table. – user14461410 Feb 17 '23 at 20:55
  • Is it always the previous one, or you have to infer it from the `order number` column? – Ignatius Reilly Feb 17 '23 at 20:56
  • @IgnatiusReilly it is usually the previous one, but that is not 100% guaranteed – user14461410 Feb 17 '23 at 20:58
  • If it's the previous one, you can use [fillna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html) with `method='ffill`. If it's not guaranteed... you need a table for conversion as suggested in a previous comment. [This](https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary) may help (select first the rows for which there're no null values). – Ignatius Reilly Feb 17 '23 at 21:01

2 Answers2

2

You need a dictionary of order number as the key and shipping address as the value, Just drop the NULLs and create a dict which you can map to the shipping address column.

di = df[['order number', 'shipping addres']]
di = di[di['shipping addres'].notnull()]
di.set_index('order number', inplace=True)
di = di.to_dict('index')
df['shipping addres'] = df['order number'].map(di)
darth baba
  • 1,277
  • 5
  • 13
  • I think this makes the most sense. Basically going order the issue of not having separate tables for order info and item info, by creating them. – user14461410 Feb 17 '23 at 21:31
0

This is an approach using df.groupby() follow by .ffill() and .bfill()

df['shipping address'] = df.groupby('order number')['shipping address'].ffill().bfill()
print(df)

  order number shipping address   item
0         A123           Canada  boots
1         A123           Canada  socks
2         A123           Canada  laces
3         B456       California  shirt
Jamiu S.
  • 5,257
  • 5
  • 12
  • 34