how to highlight sequential string in one column based on another column

Question

My data is

    df <- structure(list(M1 = c(4L, 11L, 11L, 11L, 11L, 11L, 11L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L), M2 = structure(c(14L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L), .Label = c(" B135", 
" B168", " B172", " B299", " B300", " B301", " B335", " B336", 
" B364", " B566", " B567", " B590", " B591", "A"), class = "factor"), 
    N = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), N2 = c(470L, 14L, 12L, 16L, 9L, 14L, 14L, 24L, 15L, 
    32L, 193L, 76L, 10L, 9L)), .Names = c("M1", "M2", "N", "N2"
), class = "data.frame", row.names = c(NA, -14L))

The data looks like this

>df
#   M1    M2 N  N2
#1   4     A 1 470
#2  11  B135 1  14
#3  11  B168 1  12
#4  11  B172 1  16
#5  11  B299 1   9
#6  11  B300 1  14
#7  11  B301 1  14
#8  16  B335 1  24
#9  16  B336 1  15
#10 16  B364 1  32
#11 16  B566 1 193
#12 16  B567 1  76
#13 16  B590 1  10
#14 16  B591 1   9

what I am looking for to do, is to check the M1 and based on M1 highlight the M2 I want to evaluate the sequential based on similar values of M1 in this example

#   M1    M2  N  N2
#1   4    A   1  470

so it is only one and I don't need to highlight it

#2  11  B135 1  14
#3  11  B168 1  12
#4  11  B172 1  16
#5  11  B299* 1   9
#6  11  B300* 1  14
#7  11  B301* 1  14

in this section (which all data from the M1 is 11) B299, B300 and B301 are sequential (repeated after each other) so I want to highlight it with for example a star

#8  16  B335* 1  24
#9  16  B336* 1  15
#10 16  B364  1  32
#11 16  B566**  1 193
#12 16  B567**  1  76
#13 16  B590***  1  10
#14 16  B591***  1   9

in this section( all values from M1 is 16) B335 and B336 are sequential so I highlight them with one star then B566 and B567 are also sequential with ** star because they are different from the first one , the same for the third sequential group etc

In the last section you `B335*` and `B336*` but in your example there is no `B336`. Is that a typo? — Pierre L, Mar 02 '16 at 14:16

NicE · Accepted Answer · 2016-03-02T15:01:40.087

2

Here's an attempt, this assumes the values are sorted as in your example:

 highlight_seq <- function(x){
        #get sequences of numbers and get rid of NAs
        num_seq <- (diff(as.numeric(gsub("\\D", "", x)))==1)*1
        num_seq[is.na(num_seq)] <- 0

        #to figure out the number of each sequence, use rle
        num_seq <- rle(num_seq)

        #replace 1s by the cumsum
        num_seq$values[which(num_seq$values!=0)]=cumsum(num_seq$values)[which(num_seq$values!=0)]
        num_seq <- inverse.rle(num_seq)

        #since diff was initially used, add the first value of each sequence
        num_seq <- c(0,num_seq)
        num_seq[which(num_seq!=0)-1] <- num_seq[which(num_seq!=0)] 

        #paste asterisks in after the sequences
        return(paste0(x,sapply(num_seq,function(p) paste(rep("*",p),collapse=""))))
}

library(dplyr)
df %>% group_by(M1) %>% mutate(M2=highlight_seq(M2))


    M1      M2 N  N2
1   4       A 1 470
2  11    B135 1  14
3  11    B168 1  12
4  11    B172 1  16
5  11   B299* 1   9
6  11   B300* 1  14
7  11   B301* 1  14
8  16    B335 1  24
9  16   B363* 1  15
10 16   B364* 1  32
11 16  B566** 1 193
12 16  B567** 1  76
13 16  B568** 1  10
14 16  B569** 1   9

edited Mar 02 '16 at 15:01

answered Mar 02 '16 at 14:30

NicE

21,165
3
51
68

can we get the same structure output as mentioned in the question ? – nik Mar 02 '16 at 14:43
that is definitely very good but with a small problem. it does not refresh putting the star on each section , so it is growing like crazy. lets say in that example we have 3 section 4, 11 and 16, I want each time start over won't end up with so many stars , you know what i mean? – nik Mar 02 '16 at 14:54
I see, I edited, just made it into a function and used `group_by(M1)` to apply the highlight function to each section – NicE Mar 02 '16 at 15:02
Look at `str(summ)` it is a `data.table`. Try `summ <- as.data.frame(summ)` – Pierre L Mar 02 '16 at 15:15
That worked , I don't know how to thank you, I accepted and liked your answer. – nik Mar 02 '16 at 15:21
@Pierre Lafortune I have posted a question here which no one answers, if you find some time, I will appreciate if you could comment there. in 3 hours I will put it as a bounty too http://stackoverflow.com/questions/35707323/how-to-rearrange-an-order-of-matches-between-two-data-frames – nik Mar 02 '16 at 15:23
1

Your methods are creating big problems for you down the line. Why would you add stars to numbers as you did in this question. You cannot do any mathematical or organizational functions with them that way. In the other question you create a useless link like `3-4`? That is not a programmer's mindset. You are using R to create marks and ticks as you would with a pen and pencil. It is better to use the language in a way that exploits its strengths than struggle with convoluted workarounds left and right. – Pierre L Mar 02 '16 at 15:28
@Pierre Lafortune I do understand your remark and I appreciate it, it is not that I cannot do any mathematical things on such data, it is very easy to get back to the first version. They call such highlighting inside your data! it is for more understanding a data rather than playing left and right. Since I use R, I don't want to jump to another program to do that so I keep everything in the same program as much as possible. that is the main reason I cant solve some of problems I face! But again thanks for your remark – nik Mar 02 '16 at 15:42
I will try to help with the other question. – Pierre L Mar 02 '16 at 15:49

how to highlight sequential string in one column based on another column

1 Answers1