1

I have a dataframe like a bellow

date    input   
....     org
....     Min 1
....     Min 1
....     Min 1
....     Min 2
....     Min 2
....     Min 3
....     org
....     org
....     Min 1
....     Min 2
....     Min 2
....     Min 3
....     Min 3
....     Min 4

And I want to add another column with a classification of the input like bellow

date    input      Number_input
....     org           1
....     Min 1         2
....     Min 1         2
....     Min 1         2
....     Min 2         3
....     Min 2         3
....     Min 3         4
....     org           5
....     org           5
....     Min 1         6
....     Min 2         7
....     Min 2         7
....     Min 3         8
....     Min 3         8
....     Min 4         9

Can help me? ;-)

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
Abdel_El
  • 77
  • 6
  • Can you please make your data reproducible? – Chris Ruehlemann May 07 '20 at 14:15
  • please share a reproducible example – Sotos May 07 '20 at 14:15
  • 1
    Hi Abdel, it's difficult to know exactly what you're looking for. Perhaps you could reformat your examples, or better yet, provide the output of `dput()`? You can edit your question and paste the output. You can surround it with three backticks (```) for better formatting. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. – Ian Campbell May 07 '20 at 14:15
  • thanks, @lan Campbell for formating my post to the correct format, i think now it's more clear what I'm searching to get – Abdel_El May 07 '20 at 14:27

3 Answers3

3

With dplyr:

df %>%
  mutate(Number_input = rle(input)$lengths %>% 
  {rep(seq(length(.)), .)})

Which gives:

   date  input input_number
   <chr> <chr>        <int>
 1 ….    org              1
 2 ….    Min 1            2
 3 ….    Min 1            2
 4 ….    Min 1            2
 5 ….    Min 2            3
 6 ….    Min 2            3
 7 ….    Min 3            4
 8 ….    org              5
 9 ….    org              5
10 ….    Min 1            6
11 ….    Min 2            7
12 ….    Min 2            7
13 ….    Min 3            8
14 ….    Min 3            8
15 ….    Min 4            9

dput:

structure(list(date = c("….", "….", "….", "….", "….", "….", "….", 
"….", "….", "….", "….", "….", "….", "….", "…."), input = c("org", 
"Min 1", "Min 1", "Min 1", "Min 2", "Min 2", "Min 3", "org", 
"org", "Min 1", "Min 2", "Min 2", "Min 3", "Min 3", "Min 4")), row.names = c(NA, 
-15L), class = c("tbl_df", "tbl", "data.frame"))

Found the solution from @mpettis here: Increment by 1 for every change in column

Matt
  • 7,255
  • 2
  • 12
  • 34
2

You can use diff by using the numbers from a cast to factor and create the cumsum:

cumsum(c(TRUE, diff(unclass(factor(x$input)))!=0))
# [1] 1 2 2 2 3 3 4 5 5 6 7 7 8 8 9

or you compare the shifted vectors of euqality:

cumsum(c(TRUE, x$input[-1] != x$input[-nrow(x)]))
# [1] 1 2 2 2 3 3 4 5 5 6 7 7 8 8 9

or using xtfrm instead of factor

cumsum(c(TRUE, diff(xtfrm(x$input))!=0))
# [1] 1 2 2 2 3 3 4 5 5 6 7 7 8 8 9
GKi
  • 37,245
  • 2
  • 26
  • 48
1

It seems you are looking for rleid() from data.table:

df$Number_input <- data.table::rleid(df$input)
df

   data input Number_input
1    ….   org            1
2    …. Min 1            2
3    …. Min 1            2
4    …. Min 1            2
5    …. Min 2            3
6    …. Min 2            3
7    …. Min 3            4
8    ….   org            5
9    ….   org            5
10   …. Min 1            6
11   …. Min 2            7
12   …. Min 2            7
13   …. Min 3            8
14   …. Min 3            8
15   …. Min 4            9

Reproducible data

df <- data.frame(
  data = "….",
  input = c(
    "org", "Min 1", "Min 1", "Min 1", "Min 2", "Min 2", "Min 3", 
    "org", "org", "Min 1", "Min 2", "Min 2", "Min 3", "Min 3", "Min 4"
  )
)
s_baldur
  • 29,441
  • 4
  • 36
  • 69