2

enter image description here

I am a newbie in Tableau and have no ideas how to solve the following task:

I have a table. Each row contains fields with date (date), customer id (string), order id (string), price (numeric), quantity (numeric). But not all data (rows) contains the correct information. One order can contain (but not must!) several rows with different date. In that case, the correct rows are with the last date, the rest are just garbage. So, I need to get the correct quantities of orders and customers per day.

The algorithm is simple: to get last date of each order, to filter rows of each order with only this date – these are correct data (the rest are wrong). After that to calculate number of unique orders id and customers id per day.

In conclusion, to plot a diagram with quantity of orders per day with tooltips of customers’ quantity.

I can do it with Python (just several strings needed) and I know it can be done via LOD, but I can’t. It is like an absolute obstacle now for me(((

Maybe, Python's code clarify the approach:

for order in data['order_id'].unique() : 

correct_date = data.query('order_id == @order')['date'].max()

data = data.drop(index=data.query('order_id == @order and date != @correct_date').index)
Mikhail Le
  • 37
  • 3

1 Answers1

1

If I correctly understand your problem you can use LODs calculated fields to achieve your target. I purposely used two fields instead of one for ease of reading

Field: Last update of order

[date] = { FIXED [order id] : MAX([date])}

This field should be True only in the last row of each order. With this, you can extract only values on the last date.

Field: Last quantity per order

{ FIXED [order_id] : MIN(IF [date] = [Last update of order] THEN [quantity] END) }
Alex Blakemore
  • 11,301
  • 2
  • 26
  • 49
Nicolaesse
  • 2,554
  • 12
  • 46
  • 71
  • Ooo! Definitely, first step looks right, Thanks! There is the structure of table on the picture (a part which is needed to calculate). I can’t show, but the rows with different date and the same order_id exists. So, at the first step we get needed date. Just imagine, the first cell (date) has value 30/09/2018. In case we count unique orders a day, we have 68477 order both on 30/09 and 01/10, that means we count one order twice. How to avoid it and count order on day with last date only (for current case on 01/10, but not on 30/09)? – Mikhail Le Aug 17 '23 at 15:42
  • 1
    good approach - the first calc was right, I edited the second calc a bit to correct the comment from @"Mikhail Le" – Alex Blakemore Aug 17 '23 at 16:33
  • P.S. There is a problem with using a date field to order your records. You can't know which was the last one when multiple records fall on the same final day. It would be better to use a date-time field (timestamp) or even better some sort of sequence number. So this calc uses the smallest order quantity when there are multiple order revisions on the same last day. If you don't have a reliable distinct sequence field, you need some sort of tie breaking rule like that. – Alex Blakemore Aug 17 '23 at 16:38
  • And what about this way? CorrectDate: [date] = { FIXED [order_id] : MAX([date])} CorrectOrder: IF [CorrectDate] THEN [order_id] ELSE NULL END CountOfOrders: COUNTD([CorrectOrder]) and after all steps - to plot by CountOfOrders? – Mikhail Le Aug 17 '23 at 19:59