1

Before I get into the issue, here's a 2 second background: I've been working on this RFM analysis, and thanks to our peers, was finally able to output an RFM score for each customer_id in my data set, along with each of their individual R, F, and M scores. Here it is, if you're curious or would like to use it for yourself:

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY customer_id))

ORDER BY customer_id desc)

Here's an image: enter image description here

Now, my issue is that I need to keep my output in this kind of format, but to group the data by the Month and Year as well. I initially had grouped this data by customer_id, because I want the RFM and the individual scores to only appear by unique customer_id, but now I need it by the Month+Year and the customer_id (i.e. first column would be Jan 2018, then list all the unique customer_id rows for that month/year combo. Then Feb 2018, and so on). Anyone have any suggestions?

Thank you very much and let me know if you have any questions!!

Best, Z

Z41N
  • 97
  • 10
  • What RDBMS are you using? What is the date column from which you want to extract the month and year? – Antonio Alvarez Nov 26 '19 at 17:43
  • Hi Antonio. I am using Tableau to relate data. The date field is named oms_order_date and contains data that is all formatted in the 'YYYY-MM-DD' format. – Z41N Nov 26 '19 at 17:53

2 Answers2

1

If you want to group by year-month and customer_id, in that order, change your GROUP BY:

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    YearMonth,
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    YearMonth,
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    to_char(oms_order_date, 'YYYY-MM') AS YearMonth,
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))
ORDER BY YearMonth, customer_id desc)
Antonio Alvarez
  • 466
  • 1
  • 6
  • 20
  • Added as stated, but the output states "42703: column "year" does not exist in l_dmw_order_report" - I also tried year instead of 'year' and it gave the same error. – Z41N Nov 26 '19 at 18:15
  • The error you are giving is a PostgreSQL error. I changed my reply with a way of grouping by year-month on PosgreSQL. – Antonio Alvarez Nov 26 '19 at 18:20
  • Hm that's odd, I'm using Aginity Workbench for Redshift. Anyway, I tried your edit and now its stating "42703: column "oms_order_date" does not exist in derived_table2" – Z41N Nov 26 '19 at 18:30
  • What exact query did you try? That error means the table referred in the `FROM` (which as I see, it should be `l_dmw_order_report`) does not have a column named `oms_order_date`. – Antonio Alvarez Nov 26 '19 at 18:36
  • Posting the query as an answer as its too large. – Z41N Nov 26 '19 at 18:45
  • I rewrote your query because you set the `to_char(oms_order_date, 'YYYY-MM')` on the first select without anything else. – Antonio Alvarez Nov 26 '19 at 18:49
  • Thank you, Antonio. Your re-writing of the query has done the job. I was not aware that you can use "YearMonth" by itself as a column. I now understand that you defined YearMonth in one of the SELECT clauses, then grouped it and ordered it by YearMonth, and added it to the other SELECT clauses so that it appears. Thank you!! – Z41N Nov 26 '19 at 19:02
0

As requested by Antonio:

SELECT *,
    SUBSTRING(rfm_combined,1,1) AS recency_score,
    SUBSTRING(rfm_combined,2,1) AS frequency_score,
    SUBSTRING(rfm_combined,3,1) AS monetary_score
FROM (

SELECT
    to_char(oms_order_date, 'YYYY-MM'),
    customer_id,
    rfm_recency*100 + rfm_frequency*10 + rfm_monetary AS rfm_combined
FROM
    (SELECT
    customer_id,
    ntile(5) over (order by last_order_date) AS rfm_recency,
    ntile(5) over (order by count_order) AS rfm_frequency,
    ntile(5) over (order by total_spent) AS rfm_monetary
FROM
    (SELECT
    customer_id,
    MAX(oms_order_date) AS last_order_date,
    COUNT(*) AS count_order,
    SUM(quantity_ordered * unit_price_amount) AS total_spent
FROM 
    l_dmw_order_report
WHERE
    order_type NOT IN ('Sales Return', 'Sales Price Adjustment')
    AND item_description_1 NOT IN ('freight', 'FREIGHT', 'Freight')
    AND line_status NOT IN ('CANCELLED', 'HOLD')
    AND oms_order_date BETWEEN '2018-01-01' AND '2018-12-31'

GROUP BY to_char(oms_order_date, 'YYYY-MM'), customer_id))

ORDER BY customer_id desc)

LIMIT 100

Error is stating: "42703: column "oms_order_date" does not exist in derived_table2"

I know for a fact this is a column in this table. Confirmed using: SELECT oms_order_date FROM l_dmw_order_report

Z41N
  • 97
  • 10