how to remove part of string based on a comma for all rows

Question

df <- structure(list(V = structure(c(4L, 5L, 3L, 7L, 6L, 2L, 1L), .Label = c("132 B26,172 B27,107 B57,104 B59,137 B60,133 B61,103 B62,134 B63,177 B100,123 B133,184 B168,109 B197,103 B198,173 B202,157 B203,143 B266,62 B342,62 B354,92 B355,195 B368,164 B370,52 B468,74 B469,71 B484,98 B494,66 B502,63 B601,133 B622", 
"135A,510A,511A,60 B23,67 B24,70 B25,95 B26,122 B27,123 B27,109 B60", 
"25A,28 B55,31 B56,45 B57,43 B58,5 B59,47 B59,6 B60,69 B60,66 B61", 
"267 B361,786 B363,543 B392", "563 B202,983 B360", "8 B1,12 B35,10 B71,9 B154,51 B179", 
"91 B26,117 B27,117 B28,102 B29,47 B31,96 B63,78 B64,133 B65,117 B66,121 B66,112 B67,127 B100"
), class = "factor")), .Names = "V", class = "data.frame", row.names = c(NA, 
-7L))

I want to have an output like this

V
    361, 363, 392
    202,360
    55,56,57,58,59,59,60,60,61
    26,27,28,29,31,63,64,65,66,66,67,100
    1,35,71,154,179
    23,24,25,26,27,27,60
    26,27,57,59,60,61,62,63,100,133,168,197,198,202,203,266,342,354,355,368,370,468,469,484,494,502,601,622

I have tried for one string which works

s = "267 B361"
s1 = unlist(strsplit(s, split='B', fixed=TRUE))[2]

but I don't know how to apply it on all strings which are separated by a comma in each row

This is not the way to get attention here, you should have edited your question or start a bounty to attract more attention. Please delete your older question as it is a duplicate now. — Cliff Burton, Mar 03 '16 at 12:46

score 1 · Accepted Answer · answered Mar 03 '16 at 11:52

1

We can use str_extract_all to get the numbers that follow a non-numeric character. The output will be a list, so loop over the list with sapply and paste the elements in the list together (toString is a wrapper for paste(., collapse=', ')).

library(stringr)
sapply(str_extract_all(df$V, "(?<=[A-Z])\\d+"), toString)
#[1] "361, 363, 392"                                                                                                                     
#[2] "202, 360"                                                                                                                          
#[3] "55, 56, 57, 58, 59, 59, 60, 60, 61"                                                                                                
#[4] "26, 27, 28, 29, 31, 63, 64, 65, 66, 66, 67, 100"                                                                                   
#[5] "1, 35, 71, 154, 179"                                                                                                               
#[6] "23, 24, 25, 26, 27, 27, 60"                                                                                                        
#[7] "26, 27, 57, 59, 60, 61, 62, 63, 100, 133, 168, 197, 198, 202, 203, 266, 342, 354, 355, 368, 370, 468, 469, 484, 494, 502, 601, 622"

answered Mar 03 '16 at 11:52

akrun

874,273
37
540
662

@Mol Have you added the column name, i.e. `mydata$colname` – akrun Mar 03 '16 at 12:34
can I use it like this to get the output as a data frame ? df <- as.data.frame(sapply(str_extract_all(df$V, "(?<=[A-Z])\\d+"), toString)) – nik Mar 03 '16 at 12:35
@Mol The output is a `vector`. So you can just use `data.frame(v1 = sapply(str_extract_all...., stringsAsFactors=FALSE)` – akrun Mar 03 '16 at 12:35
if I use like what you said mF <- data.frame(v1 = sapply(str_extract_all(mydata$V, "(?<=[A-Z])\\d+"), toString, stringsAsFactors=FALSE)), I get an error like Error in mydata$V : $ operator is invalid for atomic vectors – nik Mar 03 '16 at 12:39
@Mol Do you have a `matrix` or `data.frame` as input vector. Also, after the `toString` there is a closing bracket – akrun Mar 03 '16 at 12:39
the input is data frame, again the same error after adding closing brakes mF <- data.frame(v1 = sapply(str_extract_all(mydata$V, "(?<=[A-Z])\\d+"), toString), stringsAsFactors=FALSE) – nik Mar 03 '16 at 12:41
1

@Mol Using the data you showed, I am not getting an error. `head(data.frame(V1=sapply(str_extract_all(df$V, "(?<=[A-Z])\\d+"), toString),stringsAsFactors=FALSE),2)# # V1 #1 361, 363, 392 #2 202, 360` – akrun Mar 03 '16 at 12:42
1

@Mol Can you check the `str(sapply(str_extract_all(df$V, "(?<=[A-Z])\\d+"), toString))` – akrun Mar 03 '16 at 12:45
the problem solved thanks, is it possible you check this question that you solved the first part ? http://stackoverflow.com/questions/35758748/how-to-remove-part-of-string-and-count-consecutive-values-with-a-comma-separated – nik Mar 03 '16 at 12:47
1

@Mol It seems to be deleted. – akrun Mar 03 '16 at 12:49
I will post only the second part as a question – nik Mar 03 '16 at 12:50
@Mol How do you calculate the number of the consecutive number? `55-59` in the third row is one consecutive, 2nd would be `59-60` and third as `60-61` – akrun Mar 03 '16 at 12:54
this is the post http://stackoverflow.com/questions/35772784/count-both-a-set-of-consecutive-values-and-differences-between-them-in-a-row – nik Mar 03 '16 at 12:57

how to remove part of string based on a comma for all rows

1 Answers1