6

I hope to replace the first 14 dots of my.string with 14 zeroes when region = 2. All other dots should be kept the way they are.

df.1 = read.table(text = "
  city  county  state region                        my.string reg1 reg2
   1      1        1      1    123456789012345678901234567890   1    0
   1      2        1      1    ...................34567890098   1    0
   1      1        2      1    112233..............0099887766   1    0
   1      2        2      1    ..............2020202020202020   1    0
   1      1        1      2    ..............00..............   0    1
   1      2        1      2    ..............0987654321123456   0    1
   1      1        2      2    ..............9999988888777776   0    1
   1      2        2      2    ..................555555555555   0    1
", sep = "", header = TRUE, stringsAsFactors = FALSE)

df.1

I do not think this question has been asked here. Sorry if it has. Sorry also not to have spent more time looking for the solution. A quick Google search did not turn up an answer. I did ask a similar question here earlier: R: removing the last three dots from a string Thank you for any help.

I should clarify that I only want to remove 14 consecutive dots at the far left of the string. If a string begins with a number that is followed by 14 dots, then those 14 dots should remain the way they are.

Here is how my.string would look:

123456789012345678901234567890
...................34567890098
112233..............0099887766
..............2020202020202020
0000000000000000..............
000000000000000987654321123456
000000000000009999988888777776
00000000000000....555555555555
Community
  • 1
  • 1
Mark Miller
  • 12,483
  • 23
  • 78
  • 132

4 Answers4

8

Have you tried:

sub("^\\.{14}", "00000000000000", df.1$my.string )

For conditional replacement try:

> df.1[ df.1$region ==2, "mystring"] <- 
               sub("^\\.{14}", "00000000000000", df.1$my.string[ df.1$region==2] )
> df.1
  city county state region                      my.string reg1 reg2
1    1      1     1      1 123456789012345678901234567890    1    0
2    1      2     1      1 ...................34567890098    1    0
3    1      1     2      1 112233..............0099887766    1    0
4    1      2     2      1 ..............2020202020202020    1    0
5    1      1     1      2 ..............00..............    0    1
6    1      2     1      2 ..............0987654321123456    0    1
7    1      1     2      2 ..............9999988888777776    0    1
8    1      2     2      2 ..................555555555555    0    1
                        mystring
1                           <NA>
2                           <NA>
3                           <NA>
4                           <NA>
5 0000000000000000..............
6 000000000000000987654321123456
7 000000000000009999988888777776
8 00000000000000....555555555555
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 2
    Can this be done "dynamically"? as in: pad with 14 or less zeros depending on the number of `.` matched? maybe with `gsubfun`? – Justin Jan 28 '13 at 23:59
  • Certainly `gsubfn` is an extraordinary invention but I don't think it need that degree of firepower. – IRTFM Jan 29 '13 at 00:03
  • Thank you. If I make a tiny modification to the left-hand side of the line I get exactly what I want: df.1$my.string[ df.1$region ==2] <- sub("^\\.{14}", "00000000000000", df.1$my.string[ df.1$region==2] ) – Mark Miller Jan 29 '13 at 05:32
3
    gsub('^[.]{14,14}',paste(rep(0,14),collapse=''),df.1$my.string)
"123456789012345678901234567890" "00000000000000.....34567890098" "112233..............0099887766"
[4] "000000000000002020202020202020" "0000000000000000.............." "000000000000000987654321123456"
[7] "000000000000009999988888777776" "00000000000000....555555555555"
agstudy
  • 119,832
  • 17
  • 199
  • 261
3

dwin's answer is awesome. here's one that's easy to understand but not nearly as spiffy

# restrict the substitution to only region == 2..
# then replace the 'my.string' column with..
df.1[ df.1$region == 2 , 'my.string' ] <- 

    # substitute.. (only the first instance!)
    # (use gsub for multiple instances)
    sub( 
        # fourteen dots
        '..............' , 
        # with fourteen zeroes
        '00000000000000' , 
        # in the same object (also restricted to region == 2
        df.1[ df.1$region == 2 , 'my.string' ] , 
        # and don't use regex or anything special.
        # just exactly 14 dots.
        fixed = TRUE 
    )
Anthony Damico
  • 5,779
  • 7
  • 46
  • 77
  • you'll wanna escape each of those dots since `.` means any character in a regular expression. – Justin Jan 29 '13 at 00:04
  • I like all of the answers, and up-voted all of them, but right now yours is my favorite because it returns the complete data set exactly the way I hoped. I will wait until tomorrow or later to give a check-mark. – Mark Miller Jan 29 '13 at 00:12
  • @MarkMiller dwin's answer is better. you should accept his, not mine – Anthony Damico Jan 29 '13 at 00:14
  • This won't meet the "far left of the string" requirement - it will just replace the first occurrence of 14 dots it finds. The regexp `^` anchor is important here... – Charles Jan 29 '13 at 04:36
  • @Charles one of a few reasons i recommended markmiller accept the answer by dwin, not me. :) my version is easy-to-understand but clunky – Anthony Damico Jan 29 '13 at 13:14
3

A data.table solution:

require(data.table)
dt <- data.table(df.1)

# solution:
dt[, mystring := ifelse(region == 2, sub("^[.]{14}", 
                   paste(rep(0,14), collapse=""), my.string), 
                   my.string), by=1:nrow(dt)]

#    city county state region                      my.string reg1 reg2                       mystring
# 1:    1      1     1      1 123456789012345678901234567890    1    0 123456789012345678901234567890
# 2:    1      2     1      1 ...................34567890098    1    0 ...................34567890098
# 3:    1      1     2      1 112233..............0099887766    1    0 112233..............0099887766
# 4:    1      2     2      1 ..............2020202020202020    1    0 ..............2020202020202020
# 5:    1      1     1      2 ..............00..............    0    1 0000000000000000..............
# 6:    1      2     1      2 ..............0987654321123456    0    1 000000000000000987654321123456
# 7:    1      1     2      2 ..............9999988888777776    0    1 000000000000009999988888777776
# 8:    1      2     2      2 ..................555555555555    0    1 00000000000000....555555555555
Arun
  • 116,683
  • 26
  • 284
  • 387