0

I have a problem where I need to, ideally, create new values and new rows based on the length of a string.

This is my source data:

NumericCode1=c("12345","1234")
NumericCode2=c("0123.45","123.4")
AlphaCode=c("","")
df=data.frame(NumericCode1,NumericCode2,AlphaCode)

What I'd like to do is process this data using this logic:

If either of the values in NumericCode1 or NumericCode2 are greater than 5 (counting numbers only), then I'd like to populate AlphaCode with AA:BB:CC values for each. So the df would end up looking like this:

NumericCode1=c("12345","1234")
NumericCode2=c("0123.45","123.4")
AlphaCode=c("AA:BB:CC","")
df=data.frame(NumericCode1,NumericCode2,AlphaCode)

Then I could use this code to create a separate record for each and would get my desired output.

df %>% 
  separate_rows(AlphaCode, sep=":")

  NumericCode1 NumericCode2 AlphaCode
1        12345      0123.45        AA
2        12345      0123.45        BB
3        12345      0123.45        CC
4         1234        123.4          

My problem is I'm stuck at the first step. I can count the characters in the strings using nchar or str_lenght, but I cannot figure out how to "count if > 5 then do this".

Any help much appreciated.Thanks!

Seth Brundle
  • 160
  • 7
  • 1
    I don't think I understand what you mean by "populate AlphaCode with AA:BB:CC values for each". Is there a logic to the AlphaCode inserted into the appropriate rows? – divibisan Jan 14 '19 at 21:46

2 Answers2

1

You can use replace

cond <- nchar(sub("\\D", "", df$NumericCode1)) > 5 | nchar(sub("\\D", "", df$NumericCode2)) > 5
df$AlphaCode <- replace(df$AlphaCode,
                        cond,
                        "AA:BB:CC")
df
#  NumericCode1 NumericCode2 AlphaCode
#1        12345      0123.45  AA:BB:CC
#2         1234        123.4          

The condition says if either NumericCode1 or NumericCode2 has more than 5 characters - numbers only - replace the "" by AA:BB:CC.

data

df = data.frame(NumericCode1, NumericCode2, AlphaCode, stringsAsFactors = FALSE)
#                                                      ^^^^^^^^^^^^^^^^^^^^^^^^
markus
  • 25,843
  • 5
  • 39
  • 58
1

Using stringr::str_count and \\d we can count numbers only

library(dplyr)
library(stringr)
df %>% mutate(Cond=if_else(str_count(NumericCode1,'\\d')>5|str_count(NumericCode2,'\\d')>5 ,
                           'AA:BB:CC',''))

   NumericCode1 NumericCode2   Cond
1        12345      0123.45    AA:BB:CC
2         1234        123.4                   
A. Suliman
  • 12,923
  • 5
  • 24
  • 37