0

I have a dataset in which there is a column that has all the clicks data. There are 12 main keywords and if any of this keywords are found in the data then software should give the result till the stop keyword appears which are "Home" and keywords.

For example:

column:
a
b
d
g
d
home
f
v
b
p
home

The keywords are : b and f So the software should start from b and stop whenever it encounters home or b or f and this would be 1st output(b d g d home) then it should again start from F and stop at home or b or f (f v) this will be second output and then it will again start from b and stop if it encounters b,f or home (b,p,home) this will be my 3rd output. Please help me with the code. Thank you!

unknown
  • 47
  • 8
  • 1
    What is the question ? What have you tried ? What errors, messages or unexpected output or results are you seeing? – Richard Jun 17 '18 at 05:09
  • The question is how can I group observations based on start and stop keywords. I tried using package dplyr but I am new to R so I am not able to understand how to solve the question. – unknown Jun 17 '18 at 05:22
  • Please share data and code in a [reproducible manner](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Roman Luštrik Jun 17 '18 at 06:47
  • Why are the last four values one group instead of two? Shouldn't the `b` start a new group? What if you had values between `home` and the next `b` or `f`? What group should those orphaned letters be put into? – Tom Jun 17 '18 at 13:53
  • Sorry Tom it was a sample data I just edited it. – unknown Jun 17 '18 at 15:10

2 Answers2

0

(Update: changed approach after having OP's entirely different requirement.)

Your new logic has very simple implementation in R

df$grp <- cumsum(df$col1 %in% keyword | dplyr::lag(df$col1, default = 0) == "home")

which gives

> df
   col1 grp
1     a   0
2     b   1
3     d   1
4     g   1
5     d   1
6  home   1
7     f   2
8     v   2
9     b   3
10    p   3
11 home   3


Sample data:

df <- structure(list(col1 = c("a", "b", "d", "g", "d", "home", "f", 
"v", "b", "p", "home")), .Names = "col1", class = "data.frame", row.names = c(NA, 
-11L))

keyword <- c('b', 'f')
Prem
  • 11,775
  • 1
  • 19
  • 33
  • Thank you Prem the code works perfectly but there is one more addition thing which I think I forgot to mention in the question earlier. I have updated the question so can you please help me with the update. Update is: The stop keyword is not only 'home' but also any start keywords. That is if it encounters 'b'.'f' or 'home' then it should stop. Your code helps in getting the pattern till home perfectly. – unknown Jun 17 '18 at 21:31
  • See the updated answer. – Prem Jun 18 '18 at 07:33
0

Here is one way to do it in SAS. Make a new GROUP variable to indicate which group a record is in. You need to make a variable to track whether you are currently inside a group. You could extend that to number the members of the group.

Updated to have b and f start new groups even when already in a group.

data have ;
  input column $ @@;
cards;
a 
b d g d home
x y z
f v
b p home
;

data want ;
  set have ;
  retain group 0 member 0 ;
  if member then member+1;
  if column in ('b','f') then do;
    member=1;
    group+1;
  end;
  if member then output;
  if column = 'home' then member=0;
run;

enter image description here

Tom
  • 47,574
  • 2
  • 16
  • 29
  • I am sorry but I believe I have one more small problem: the stop keywords apart from home can also be 'b' or 'f'. So that means if it encounters 'home','b' or 'f' it should stop and make a group till that observation. – unknown Jun 17 '18 at 17:53
  • You need to update the question with all of the rules and appropriate examples. – Tom Jun 17 '18 at 19:53
  • I am sorry I know I wasn't clear in giving all the rules but now I have updated the question. – unknown Jun 17 '18 at 20:07
  • Did you try just changing the last IF statement to treat `home`,`b`, and `f` as stop words? Perhaps a an extra test to not stop when `b` or `f` is first member in the group? – Tom Jun 17 '18 at 20:10
  • Yup I tried and got something like this: Dataset was : a b d g d home f v b g home Results: column Group member b 1 1 f 2 1 b 3 1 – unknown Jun 17 '18 at 20:39
  • Just needed to remove `ELSE` so that `b` and `f` start a new group whenever they are seen. – Tom Jun 17 '18 at 22:39
  • Thank you !!! The code works properly and this will help me in doing clickstream data analysis. – unknown Jun 18 '18 at 00:08