1

Here is my dataframe:

structure(list(a = c(1, 1, -1, 1, 1, 1, -1, 1, 1, 1, 1)), .Names = "a", row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

Now I want to add an identification column that will act like index:

I mean that I want to add a column that will start from id = 1 and each time there is -1 to set it to be id = 2 and so on: Expected:

structure(list(a = c(1, 1, -1, 1, 1, 1, -1, 1, 1, 1, 1), b = c(1, 
1, 2, 2, 2, 2, 3, 3, 3, 3, 3)), .Names = c("a", "b"), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

Using the solution from R add index column to data frame based on row values didn't work for my needs.

SteveS
  • 3,789
  • 5
  • 30
  • 64

2 Answers2

3

You can also do it like this. Just cumsum the logical vector created by a==-1 and add one to the result of that:

library(dplyr)

df1 %>%
  mutate(b = cumsum(a == -1) + 1)

or with Base R:

df1$b = cumsum(df1$a == -1) + 1

Result:

# A tibble: 11 x 2
       a     b
   <dbl> <dbl>
 1     1     1
 2     1     1
 3    -1     2
 4     1     2
 5     1     2
 6     1     2
 7    -1     3
 8     1     3
 9     1     3
10     1     3
11     1     3

Data:

df1 = structure(list(a = c(1, 1, -1, 1, 1, 1, -1, 1, 1, 1, 1)), .Names = "a", row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Please correct me if I am wrong, `cumsum` will sum the -1 occurrences and and keep it same until another -1 comes in making idx be greated by 1? @useR – SteveS Jul 19 '18 at 14:48
  • 1
    @steves Correct. `a==-1` creates a logical vector of `TRUE` when `a==-1` and `FALSE` when `a != -1`. When you apply `cumsum` to it, it starts with 0 because the first row is not `-1` and only adds one if it encounters a `TRUE`. Since you wanted `b` to start with 1, I added `1` to the vector so all values are increased by 1 – acylam Jul 19 '18 at 14:53
1

You can do it like this:

  1. create a new helper column, which has the value 1 in the first row and every time there is a -1.

  2. create the index column by using the cumsum function and delete the helper column

    library(dplyr)
    
    df %>%
      mutate(helper = ifelse(row_number()==1, 1, 
        ifelse(a == -1, 1, 0))) %>% 
      mutate(index = cumsum(helper)) %>%
      select(-helper)
    
kabr
  • 1,249
  • 1
  • 12
  • 22