1

I have a data set of all customers purchases made in the last 5 years with the below columns as an example.

CUSTOMER ID | ORDER NUMBER | ORDER DATE (DateTime) | ITEM ID

What I need to do is assign each individual customer order position (i.e 1,2,3,4) - So for each customer I am able to know which order was their 1st 2nd or 3rd using the above criteria

Things to consider:

  1. There are multiple Customer ID's and Order Numbers in the same table
  2. There are multiple rows from the same customer & order number combination as each order has multiple items, so if the first order for a customer has 3 items on it I want all 3 items to show as 1.

Struggling to find a starting point of how to do this

Below is the data table:

import pandas as pd
df = pd.DataFrame({'Customer ID' : ['C100003','C100002','C100002','C100002','C100003','C100002'],
                    'Order Number' : ['RJSJ0053','RJSJ0060','RJSJ0085','RJSJ0085','RJSJ0089','RJSJ0092'],
                    'Order Date' : ['2023-05-08','2023-06-09','2023-06-13','2023-06-13','2023-06-13','2023-06-14'],
                    'Item ID' : ['Mouse','Keyboard','Computer','Monitor','Keyboard','Headset'],
                    'Order Position' : [1,1,2,2,2,3]})
RJL
  • 13
  • 3
  • 1
    A good starting point is posting some sample data and your desired output so we can help. – Stu Sztukowski Jun 13 '23 at 12:14
  • Apologies - I have added to the original post – RJL Jun 13 '23 at 12:40
  • 1
    Refrain from showing your dataframe as an image. Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Jun 13 '23 at 12:45
  • You can use stackexchange's table markdown or a site like [this](https://ozh.github.io/ascii-tables/) to format your table data to show correctly in your question. Images of text are frowned upon, as folks helping would have to transcribe the image back into text to replicate, and image links die over time making your question useless to future visitors. – JNevill Jun 13 '23 at 13:02
  • Ideally please create some code with dictionaries/lists or a table that can be copied/pasted (e.g. something that can be read in with `pd.read_clipboard())` – Stu Sztukowski Jun 13 '23 at 13:08
  • 1
    Hey all, thanks for the direction - hopefully the above is helpful. I have included the desired output (Order Position column). – RJL Jun 13 '23 at 14:28

1 Answers1

1

First, remove all duplicate [Customer ID, Order Number]. This will get you a table that looks like this:

  Customer ID Order Number  Order Date   Item ID
0     C100003     RJSJ0053  2023-05-08     Mouse
1     C100002     RJSJ0060  2023-06-09  Keyboard
2     C100002     RJSJ0085  2023-06-13  Computer
4     C100003     RJSJ0089  2023-06-13  Keyboard
5     C100002     RJSJ0092  2023-06-14   Headset

Now we can sort this by [Customer ID, Order Date], create a cumulative count grouped by Customer ID, then fill forward the missing values since those are multiple items in the same order.

df['Order Position'] = (df.drop_duplicates(['Customer ID', 'Order Number'])
                          .sort_values(['Customer ID', 'Order Date'])
                          .groupby('Customer ID')
                          .cumcount()+1
                       )

df['Order Position'] = (df.sort_values(['Customer ID', 'Order Date'])['Order Position']
                          .fillna(method='ffill')
                          .astype(int)
                       )
  Customer ID Order Number  Order Date   Item ID  Order Position
0     C100003     RJSJ0053  2023-05-08     Mouse               1
1     C100002     RJSJ0060  2023-06-09  Keyboard               1
2     C100002     RJSJ0085  2023-06-13  Computer               2
3     C100002     RJSJ0085  2023-06-13   Monitor               2
4     C100003     RJSJ0089  2023-06-13  Keyboard               2
5     C100002     RJSJ0092  2023-06-14   Headset               3
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
  • Thanks for the reply! With the initial step of removing the duplicate order number, it is 2 different items bought within the same order; in which cause I would want to keep it in. This way if I want to look at "The 4th order Customer X made" I will see all the items associated with that order. Is there a way to get the desired Order Position without groupby? – RJL Jun 13 '23 at 15:58
  • Check the updated solution. If you chain all of the operations together, you'll get the table that you need in the end. – Stu Sztukowski Jun 13 '23 at 16:20
  • Hey Stu - Thanks so much for the solution worked perfectly! – RJL Jun 14 '23 at 08:57
  • Great to hear! Be sure to mark this solution as accepted so other people can find it too. – Stu Sztukowski Jun 14 '23 at 09:59
  • Hi Stu - apologies, it did work but upon closer inspection on my actual data set, I noticed that the 2nd customer seems to be assigned Order 2 for the earliest order, then the next chronologically was assigned 1 not sure why that may be – RJL Jun 14 '23 at 10:02
  • If you can add some data that reproduces the issue to your original question, I can work on adjusting the solution to account for it – Stu Sztukowski Jun 14 '23 at 10:58
  • Sure thing - Will try to recreate the problem and re-post. – RJL Jun 15 '23 at 11:32
  • 1
    Hey Stu, I was referencing the wrong column. Your solution worked! Thanks again for the help. – RJL Jun 20 '23 at 09:13