Remove some characters from a string by index (Raku)

Question

FAQ: In Raku, how do you remove some characters from a string, based on their index?

Say I want to remove indices 1 to 3 and 8

xxx("0123456789", (1..3, 8).flat);  # 045679

Sebastian · Accepted Answer · 2020-03-27T09:18:39.067

14

Variant of Shnipersons answer:

my $a='0123456789';
with $a {$_=.comb[(^* ∖ (1..3, 8).flat).keys.sort].join};
say $a;

In one line:

say '0123456789'.comb[(^* ∖ (1..3, 8).flat).keys.sort].join;

or called by a function:

sub remove($str, $a) {
    $str.comb[(^* ∖ $a.flat).keys.sort].join;
}

say '0123456789'.&remove: (1..3, 8);

or with augmentation of Str:

use MONKEY-TYPING;
augment class Str {
    method remove($a) {
        $.comb[(^* ∖ $a.flat).keys.sort].join;
    }
};

say '0123456789'.remove: (1..3, 8);

edited Mar 27 '20 at 09:18

answered Mar 26 '20 at 18:03

Sebastian

1,834
2
10
22

That solves the problem completely in my opinion. Thanks for reminding that \ and (-) are equivalent. I don't see other ways to [slice](https://docs.raku.org/language/subscripts#Slices) with the index I __don't__ want and not the indices I want. – Tinmarino Mar 26 '20 at 18:16
1

You don't have to use `MONKET-TYPING` if you just make it free floating method and call it as `'foobar'.&remove: (1..2, 4);` (augment can have problems with composition if used several times) – user0721090601 Mar 26 '20 at 19:07
(which isn't to say augment is bad, just that the `.&remove` is a way to remove that. – user0721090601 Mar 26 '20 at 19:12
I added the non-augmentation variant as to your suggestion. Thank you. – Sebastian Mar 26 '20 at 19:42
Probably combining chenyfs pairs and noelem with map would be even more elegant - no sorting required: .comb.pairs.map({.value if .key ∉ (1..3, 8).flat}).join ? – Sebastian Mar 26 '20 at 19:44
I could shorten away (0 .. .chars -1) away with just the Whatever ^* – Sebastian Mar 27 '20 at 09:13
1

∖ confusing and seems like a backslash character. – Shniperson Mar 28 '20 at 07:13
1

Yeah, looks much like a backslash, but is not. But it is the usual character for a difference of sets, e.g. https://en.wikipedia.org/wiki/Complement_(set_theory)#Definition – Sebastian Mar 28 '20 at 08:50

score 12 · Answer 2 · answered Mar 26 '20 at 02:38

12

.value.print if .key  !(elem) (1,2,3,8) for '0123456789'.comb.pairs

answered Mar 26 '20 at 02:38

chenyf

5,048
1
12
35

raiph · Answer 3 · 2020-03-28T13:51:17.433

My latest idea for a not-at operation (I'll cover the implementation below):

Usage:

say '0123456789'[- 1..3, 8 ]; # 045679

Implementation, wrapping (a variant of) Brad's solution:

multi postcircumfix:<[- ]> (|args) { remove |args }

sub remove( Str:D $str is copy, +@exdices){
    for @exdices.reverse {
        when Int   { $str.substr-rw($_,1) = '' }
        when Range { $str.substr-rw($_  ) = '' }
    }
    $str
}

say '0123456789'[- 1..3, 8 ]; # 045679

The syntax to use the operator I've declared is string[- list-of-indices-to-be-subtracted ], i.e. using familiar [...] notation, but with a string on the left and an additional minus after the opening [ to indicate that the subscript contents are a list of exdices rather than indices.

[Edit: I've replaced my original implementation with Brad's. That's probably wrong-headed because, as Brad notes, his solution "assumes that the [exdices] are in order from lowest to highest, and there is no overlap.", and while his doesn't promise otherwise, using [- ... ] is awfully close to doing so. So if this syntax sugar were to be used by someone, they should probably not use Brad's solution. Perhaps there is a way to eliminate the assumption Brad's makes.]

I like this syntax but am aware that Larry deliberately did not build in use of [...] to index strings so perhaps my syntax here is inappropriate for widespread adoption. Perhaps it would be better if some different bracketing characters were used. But I think use of a simple postcircumfix syntax is nice.

(I've also tried to implement a straight [ ... ] variant for indexing strings in exactly the same way as for Positionals but have failed to get it to work for reasons beyond me tonight. Weirdly [+ ... ] will work to do exdices but not to do indices; that makes no sense to me at all! Anyhow, I'll post what I have and consider this answer complete.)

[Edit: The above solution has two aspects that should be seen as distinct. First, a user-defined operator, the syntactic sugar provided by the postcircumfix:<[- ]> (Str ..., declaration. Second, the body of that declaration. In the above I've used (a variant of) Brad's solution. My original answer is below.]

~~Because your question boils down to removing some indices of a .comb, and rejoining the result, your question is essentially a duplicate of ...~~ [Edit: Wrong, per Brad's answer.]

What is a quick way to de-select array or list elements? adds yet more solutions for the [.comb ... .join] answers here.

Implemented as two multis so the same syntax can be used with Positionals:

multi postcircumfix:<[- ]> (Str $_, *@exdex) { .comb[- @exdex ].join }

multi postcircumfix:<[- ]> (@pos,   *@exdex) { sort keys ^@pos (-) @exdex } 

say '0123456789'[- 1..3, 8 ]; # 045679

say (0..9)[- 1..3, 8 ];       # (0 4 5 6 7 9)

The sort keys ^@pos (-) @exdices implementation is just a slightly simplified version of @Sebastian's answer. I haven't benchmarked it against jnthn's solution from the earlier answer I linked above but if that's faster then it can be swapped in instead. *[Edit: Obviously it should instead be Brad's solution for the string variant.]*

"I think use of a simple postcircumfix syntax is nice". Definitely ! I love this solution: super clear to read. — Tinmarino, Mar 27 '20 at 03:33
I've always felt that "negative" (or "negative-appearing") indices are inherently problematic. — jubilatious1, Sep 23 '21 at 04:07
@jubilatious1 If someone doesn't like `-`, they could pick a different character/string that signifies "exclude these". Perhaps `[except ...]`? Brevity was the soul of this answer, so I'd be happier with `[not ...]`. More generally, such things are somewhat subjective. Note .@Tinmarino's reaction. To be clear, the `-` isn't "negative" in a numeric addition/subtraction sense, but in the regex character class sense -- `<-[abc]>`, i.e. "exclude these". It's only "negative-appearing" in the sense it's `-`, a character, like it is in "negative-appearing". In the latter, it's understood as "hyphen". — raiph, Sep 23 '21 at 11:23
Hello @raiph! Yes, agreed on what you said. My hope was to **stimulate** discussion on what those non-negative, non-hypen looking characters might be. Tossing out random ideas here...something like `⊣` (LEFT TACK, Unicode: U+22A3) ?? Of course, this could conflict with a pre-existing meaning... . — jubilatious1, Sep 23 '21 at 16:40
@jubilatious1 Please join me in [a chat room I just made for this discussion](https://chat.stackoverflow.com/rooms/237429/except-these). — raiph, Sep 23 '21 at 18:20
@raiph @Sebastian Sorry for the lack of clarity in my last message. If I call the Raku one-liner `raku -e '"0123456789".elems.put;'` at the bash command line, I get the result `1`. The string `0123456789` is one element. Conceptually it's difficult to understand how a postfix `[ … ]` can make any meaningful change to a object with only one (`1`) element. — jubilatious1, Sep 25 '21 at 17:40

score 9 · Answer 4 · edited Sep 12 '21 at 19:11

9

Everyone is either turning the string into a list using comb or using a flat list of indices.

There is no reason to do either of those things

sub remove( Str:D $str is copy, +@indices ){
    for @indices.reverse {
        when Int   { $str.substr-rw($_,1) = '' }
        when Range { $str.substr-rw($_  ) = '' }
    }
    $str
}

remove("0123456789",  1..3, 8 );  # 045679
remove("0123456789", [1..3, 8]);  # 045679

The above assumes that the indices are in order from lowest to highest, and there is no overlap.

edited Sep 12 '21 at 19:11

codesections

8,900
16
50

answered Mar 27 '20 at 23:35

Brad Gilbert

33,846
11
78
129

1

This is the fastest answer by a factor of 150 on my machine (with `my $s = "0123456789" x 1000; my $l = (1..3, 8, 40, 100, 1001, 4000..4100).flat`). Comb is long for long strings Thanks @BradGilbert, this will definitely help some people, at least me :-) – Tinmarino Mar 28 '20 at 00:47
1

@Tinmarino That is because MoarVM doesn't usually copy strings, instead it creates substring objects that point into the original string. When you use `.comb` it has to create many of those objects, and combine them back together. With `substr` it creates as few of those objects as possible. – Brad Gilbert Mar 28 '20 at 02:10
"substring objects that point into the original string" : is that why it was decided to implement Str as immutable ? Impressive optimization anyway. – Tinmarino Mar 28 '20 at 02:19

score 8 · Answer 5 · answered Mar 26 '20 at 13:00

8

yet another variants:

print $_[1] if $_[0] !(elem) (1,2,3,8) for ^Inf Z 0..9;

.print for ((0..9) (-) (1,2,3,8)).keys;

answered Mar 26 '20 at 13:00

Shniperson

549
4
9

score 8 · Answer 6 · answered Mar 26 '20 at 17:15

8

This is the closest I got in terms of simplicity and shortness.

say '0123456789'.comb[ |(3..6), |(8..*) ].join

answered Mar 26 '20 at 17:15

Holli

5,072
10
27

score 5 · Answer 7 · answered Mar 26 '20 at 16:27

5

my $string='0123456789';
for (1..3, 8).flat.reverse { $string.substr-rw($_, 1) = '' }

answered Mar 26 '20 at 16:27

Sebastian

1,834
2
10
22

codesections · Answer 8 · 2021-09-12T20:43:09.243

I know this is an old-ish question, but here's another version that takes a similar approach to Brad Gilbert's substr-rw solution but avoids needing to edit the string in-place (and thus fits better with a functional-programming mindset):

sub remove(Str:D $str, +@indices) {
    ($str, |@indices.reverse).reduce: -> $_, $idx {
       { S/.**{$idx.head} <(.**{$idx.elems})> .*// }}
}


say remove "0123456789", 1..3, 8;    # OUTPUT: «045679»
say remove "0123456789", [1..3, 8];  # OUTPUT: «045679»

This is not that different from the substr-rw solution when both are wrapped up in a function. But the version above lends itself to better inline use, at least imo:

say S/.**5 <(.**3)> .*// with "0123456789"; # OUTPUT: «0123489»

instead of

say do with "0123456789" -> $_ is copy { .substr-rw(5..7) = ''; $_ } # OUTPUT: «0123489»

Note, however, that the S///-based approach isn't any faster than the .comb approaches so using .substr-rw could be useful in performance-critical areas.

Remove some characters from a string by index (Raku)

8 Answers8

Linked