0

I am trying to compare the current row with the previous row. If the string matches it should keep the same number. If the string doesn't match, it needs to add to a counter on a separate row.

This is what I have.

    Error   NIO belt full      
    Error   NIO belt full      
    Error   transport - x axis error [70MA1]   
    Error   loading - lifter hp [341BG4 + 341BG6] not reached
    Place   LF - NIO Station6 graph windows 2 out of
    Error   loading - scanner not ready    
    Error   loading - scanner not ready    

This is that I need

What I need

I have tried the following code

knime.out <- knime.in n <- nrow(knime.in) knime.in$record_number <- rep(0, n) r_num <- 1 for (i in 1:n) { knime.in$record_number[i] <- r_num if(knime.in$"combined_string" == lag(knime.in$"combined_string",k=1:) { r_num <- r_num+1     } }

  • Hi and welcome! You'll get faster, better help if you make your question reproducible. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for help on how to do that. Good luck! – jpsmith Mar 21 '23 at 18:36

3 Answers3

0

This should do:

df = tibble(combined_string = c('a', 'a', 'b', 'c', 'd', 'd', 'e'))


df %>% 
mutate(record_number = cumsum((combined_string  != 
                              lag(combined_string )) %>% replace_na(1))  )

Output:

# A tibble: 7 × 2
  a     record_number
  <chr>         <dbl>
1 a                 1
2 a                 1
3 b                 2
4 c                 3
5 d                 4
6 d                 4

Explanation:

combined_string != lag(combined_string ) checks if the row is different than the previous row, brings this:

  a     example
  <chr>         <dbl>
1 a                 NA
2 a                 0
3 b                 1
4 c                 1
5 d                 1
6 d                 0

I replace the NA with a 1, so I can compute the cumulative sum:

  a     example
  <chr>         <dbl>
1 a                 1
2 a                 0
3 b                 1
4 c                 1
5 d                 1
6 d                 0

Then I get the cumulative sum, getting to the final result

Juan C
  • 5,846
  • 2
  • 17
  • 51
0

Using data provided by @Andre Wildberg, you might use base R by specifying a for-loop like

# create new column first 
df$record_number <- 1

for(i in 2:nrow(df)) {

  if(df[i, "combined_string"] == df[i-1, "combined_string"]) {
    df$record_number[i] <- df$record_number[i-1] 
  } else {
    df$record_number[i] <- df$record_number[i-1] + 1 
    }
}

gives

> df
     hm                                   combined_string record_number
1 Error                                     NIO belt full             1
2 Error                                     NIO belt full             1
3 Error                  transport - x axis error [70MA1]             2
4 Error loading - lifter hp [341BG4 + 341BG6] not reached             3
5 Place          LF - NIO Station6 graph windows 2 out of             4
6 Error                       loading - scanner not ready             5
7 Error                       loading - scanner not ready             5
Pax
  • 664
  • 4
  • 23
0

An approach using data.tables rleid

library(data.table)

cbind(df, record_number = rleid(df$combined_string))
     hm                                   combined_string record_number
1 Error                                     NIO belt full             1
2 Error                                     NIO belt full             1
3 Error                  transport - x axis error [70MA1]             2
4 Error loading - lifter hp [341BG4 + 341BG6] not reached             3
5 Place          LF - NIO Station6 graph windows 2 out of             4
6 Error                       loading - scanner not ready             5
7 Error                       loading - scanner not ready             5

With dplyr >= 1.1.0

library(dplyr)

cbind(df, record_number = consecutive_id(df$combined_string))
     hm                                   combined_string record_number
1 Error                                     NIO belt full             1
2 Error                                     NIO belt full             1
3 Error                  transport - x axis error [70MA1]             2
4 Error loading - lifter hp [341BG4 + 341BG6] not reached             3
5 Place          LF - NIO Station6 graph windows 2 out of             4
6 Error                       loading - scanner not ready             5
7 Error                       loading - scanner not ready             5

Data

df <- structure(list(hm = c("Error", "Error", "Error", "Error", "Place", 
"Error", "Error"), combined_string = c("NIO belt full", "NIO belt full", 
"transport - x axis error [70MA1]", "loading - lifter hp [341BG4 + 341BG6] not reached", 
"LF - NIO Station6 graph windows 2 out of", "loading - scanner not ready", 
"loading - scanner not ready")), row.names = c(NA, -7L), class = "data.frame")
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29