0

Sample Data action advertname aLoad
bServed Leanardo Vertical Tie Horizontal click Leanardo Vertical Tie Horizontal aLoad
bServed The Label Vineet aLoad
aLoad
aLoad
aLoad
bServed Clooney the label close Clooney the label aLoad
aLoad
aLoad
bServed Angad Vertical Clooney Horizontal close Angad Vertical Clooney Horizontal

I need to number Advert name by comparing the before what i actually used in Excel to generate that is IF(Advertname3=Advertname2,Adblk2,Adblk2+1)

action advertname AdBlk aLoad 1 bServed Leanardo Vertical Tie Horizontal 2 click Leanardo Vertical Tie Horizontal 2 aLoad 3 bServed The Label Vineet 4 aLoad 5 aLoad 5 aLoad 5 aLoad 5 bServed Clooney the label 6 close Clooney the label 6 aLoad 7 aLoad 7 aLoad 7 bServed Angad Vertical Clooney Horizontal 8 close Angad Vertical Clooney Horizontal 8

I am working on the click stream data of size more than a million size.I am trying to create the ad number based advert name for sorting purpose as the second is not recorded in time.

ID_Sort[1,24] <- 1
for(i in 2:nrow(ID_Sort))
 {
  if(ID_Sort[i,14] == ID_Sort[(i-1),14])
  {
  a <- ID_Sort[(i-1),24]
  ID_Sort[i,24] <- a
  }
  else
  {
  a <- ID_Sort[(i-1),24]
  ID_Sort[i,24] <- a+1
  }
}

This code is working fine for sample data with minimum time but taking long time for 1million+ data. So please help me to overcome from this delay. Is there any way rather than FOR loop.

sandeep
  • 31
  • 2
  • 9
  • have you heard of `break`? – Amit Joki Mar 31 '14 at 07:45
  • no, can u help me to know it as i am new to Programming. – sandeep Mar 31 '14 at 07:46
  • I don't know `r` mate – Amit Joki Mar 31 '14 at 07:47
  • I have not tried to run your code, but maybe `ifelse` would be faster. Consider posting a small artificial data set. – Mark Miller Mar 31 '14 at 07:50
  • How to post artificial data set where can i attach it – sandeep Mar 31 '14 at 08:00
  • 2
    Read [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) about how to make a small reproducible example. You can edit your original question to include the data. – Thomas Mar 31 '14 at 08:15
  • 1
    One change that might speed up your code would be to eliminate the two lines that define `a` and use, for example, `ID_Sort[i,24] <- ID_Sort[(i-1),24]` and `ID_Sort[i,24] <- ID_Sort[(i-1),24]+1`. Another possibility might be to create a data set with only two columns and pass that data set through your loop instead of passing a data set with 24 columns since the loop only operates on two columns. – Mark Miller Mar 31 '14 at 08:20
  • Thanks @Thomas But my data consisting of many Pageurl links so on pasting it in question and on saving it its giving a pop up saying not allowed to enter more than 4 url for newbie. – sandeep Mar 31 '14 at 08:53
  • I think once you give us a reproducible example we could help you in no time. I hope you're not doing this operation as a 'data.frame', convert your data into matrix or use a library like 'data.table'. Second I think It would be a better if you just to index it in the loop and then sort your data once you finished identifying your matches. Looking at what you're trying to do: maybe you should look into '?rle' function. Since your trying to identify the matches look into 'rle(ID_Sort[,24])$lengths' – Pork Chop Mar 31 '14 at 09:01
  • @MarkMiller sir removed 'a' but still its taking long time i feel the second method what you have mentioned is good i will try to work on it – sandeep Mar 31 '14 at 09:05
  • @MarkMiller hello this is sample data – sandeep Mar 31 '14 at 10:19
  • @MarkMiller I have posted the test data please help on it now – sandeep Apr 01 '14 at 05:38

1 Answers1

0

I am not sure whether this does what you want or whether it is faster than your code, but I suspect it might be. Although creating two copies of the data set as below might be pretty inefficient.

ID_Sort <- read.table(text='
    a  1  10
    b  1  10
    c  2  20
    d  2  20
    e  3  30
    f  3  30
', header=FALSE)

ID_Sort.c <- ID_Sort[-nrow(ID_Sort),2:3]
ID_Sort.b <- ID_Sort[-1            ,2:3]

V3 <- ifelse(ID_Sort.b$V2 == ID_Sort.c$V2, ID_Sort.c$V3, ID_Sort.c$V3+1)

ID_Sort$V3 <- c(1,V3)
ID_Sort

  V1 V2 V3
1  a  1  1
2  b  1 10
3  c  2 11
4  d  2 20
5  e  3 21
6  f  3 30
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • Hello this is not what i wanted – sandeep Apr 01 '14 at 05:37
  • Sample Data is Posted. – sandeep Apr 01 '14 at 05:38
  • @sandeep I suggest you reduce the size of your example data and also post the answer you want. That is what others mean by a reproducible example. The answer I posted was based on my best guess of what you wanted. If you want me to improve the answer I need a better understanding of what it is that you want. So post the desired answer. Until you do I do not think anyone else is going to try to help. – Mark Miller Apr 01 '14 at 06:19