1

I have some raw data that looks like this, after I exported it from R and manually added the Segment column in Excel.

Raw Data pulled into excel

Raw Data pulled into excel

I want to write code in R that will add the Segment for me.

Data description...

ID: Random distinct numbers
flag: Yes or No
Period: 0-12 Mths or 12-24 Mths
Spend Group: High, Medium or Low

Rules...

Flag = Yes
Period = 0-12 Mths
Spend Group = High
Then Segment = G01

Flag = Yes
Period = 0-12 Mths
Spend Group = Medium
Then Segment = G02

Flag = Yes
Period = 0-12 Mths
Spend Group = Low
Then Segment = G03

Flag = Yes
Period = 12-24 Mths
Spend Group = High
Then Segment = G04

...and so on until all of the variables have been accounted for

I want the code to recognise differences each time the data is processed eg, the flag won't always be Yes or No. Sometimes it will be Yes for every row.

My complete dataset summarised looks like the following image.

Summary Data

Summary Data

How would you start to code this in R?

Nithin
  • 1,376
  • 12
  • 29
Super_Si
  • 37
  • 8
  • 1
    First, you'll want to save it as a CSV, then read it into R as a data frame with ```df <- read.csv("C:/Users/you/path_to_file/file.csv")```. After that I'd recommend reading a tutorial on R. This is one of the first things you'll learn – rsoren Jul 13 '17 at 06:44
  • Thanks @rsoren. I already have all the data in R, this is the last step in processing. I don't want to have to type the rules out as there could be many more variables than those shown above. Ideally I'd like to create vectors or something similar for each condition and use these as the backbone for the rules eg, flag <- c("Yes","No"), recency <- c("0-12 Mths","12-24 Mths") etc. Can you recommend a tutorial. – Super_Si Jul 13 '17 at 07:00
  • 1
    Have you tried anything? One camp of users might suggest `dplyr` pipelines, while others would counter with `data.table` examples. It can also be done rather straight-forward in R, perhaps using `ifelse` or other vector-based operations. First, though, it would be much simpler to craft something if we had some data to play with, and I'm not inclined to transcribe from an image. There are good suggestions for crafting good questions at [help/mcve](https://stackoverflow.com/help/mcve) and [a popular q/a](https://stackoverflow.com/a/5963610/3358272). – r2evans Jul 13 '17 at 07:12

3 Answers3

1

On top of my head merge seems to be the right way.

combine = merge(raw_data, summary_data, by.x=c("flag","period","spend_group"), by.y =c("flag","recency","spend_band"))

anarchy
  • 551
  • 2
  • 23
  • @Super_Si kindly accept the answers as solution if it helped you – anarchy Jul 13 '17 at 12:54
  • See data link above. I think I need to play around with something like this as my values can change each time the data is loaded and using the field name rather than writing the values for each outcome would take this into account. Just not sure how to do it! – Super_Si Jul 13 '17 at 23:30
0

I'd do it with a four-step ifelse().

You reference your data within the four subsets Flag == "yes" & Recency == "0-12 Mths", Flag == "yes" & Recency == "12-24 Mths", Flag == "no" & Recency == "0-12 Mths" and Flag == "no" & Recency == "12-24 Mths" and launch ifelse() calls:

mydata$Segment[mydata$Flag == "yes" & mydata$Recency == "0-12 Mths"] <- with(mydata[mydata$Flag == "yes" & mydata$Recency == "0-12 Mths", ], 
  ifelse(Spend Band == "High", "G01", 
     ifelse(Spend Band == "Medium", "G02", "G03")))

mydata$Segment[mydata$Flag == "yes" & mydata$Recency == "12-24 Mths"] <- with(mydata[mydata$Flag == "yes" & mydata$Recency == "12-24 Mths", ], 
  ifelse(Spend Band == "High", "G04", 
     ifelse(Spend Band == "Medium", "G05", "G06")))

mydata$Segment[mydata$Flag == "no" & mydata$Recency == "0-12 Mths"] <- with(mydata[mydata$Flag == "no" & mydata$Recency == "0-12 Mths", ], 
  ifelse(Spend Band == "High", "G07", 
    ifelse(Spend Band == "Medium", "G08", "G09")))

mydata$Segment[mydata$Flag == "no" & mydata$Recency == "12-24 Mths"] <- with(mydata[mydata$Flag == "no" & mydata$Recency == "12-24 Mths", ], 
  ifelse(Spend Band == "High", "G10", 
    ifelse(Spend Band == "Medium", "G11", "G12")))

You will have to change variable names, as Spend Band is not a viable variable name in R and therefore I guess yours will be called Spend_Band or something similar.

LAP
  • 6,605
  • 2
  • 15
  • 28
  • 1
    Might be easier to make a separate data frame with the options and the outcomes and use dplyr::left_join() to get join the data and give Segment – Sarah Jul 13 '17 at 07:23
  • Yeah, unfortunately I'm not well versed in dplyr. – LAP Jul 13 '17 at 08:01
  • Thanks for your comment. See data link above. I think I need to play around with something like the answer provided by @abishek, as my values can change each time the data is loaded and using the field name rather than writing the values for each outcome would take this into account. Just not sure how to do it! – Super_Si Jul 13 '17 at 23:31
0

to expand on my comment above (sorry I was in a hurry), I'd recommend having your Rules in a csv (or just defined as a data frame, but if they are going to change a csv might be easiest) and read them in as a separate data.frame eg below. This would have every combination of flag, period and spend_group with the Segment you want to assign to each combination.

Assuming your data (with colnames "Unique.ID","flag","period","spend_group") is in df

library(dplyr)
df_withSegment <- left_join(df, rules)

as the "flag","period","spend_group" are the all the columns with common names it will assign the value of Segment in rules to each combination of these three columns.

rules eg dataframe (from csv or R)

enter image description here

Sarah
  • 3,022
  • 1
  • 19
  • 40
  • this worked perfectly... finaldf <- left_join(df, Rules, by = c("Flag" = "Flag","Period" = Period","SpendGroup" = "SpendGroup")). Thanks everyone. – Super_Si Jul 14 '17 at 04:50