perl6 What is a quick way to de-select array or list elements?

Question

To select multiple elements from an array in perl6, it is easy: just use a list of indices:

> my @a = < a b c d e f g >;
> @a[ 1,3,5 ]
(b d f)

But to de-select those elements, I had to use Set:

> say @a[ (@a.keys.Set (-) (1,3,5)).keys.sort ]
(a c e g)

I am wondering if there is an easier way because the arrays I use are often quite large?

Perhaps an array is not the best way to do these things? Perhaps a hash would be better, using the indexes as keys? — Elizabeth Mattijsen, Aug 20 '19 at 09:48
Oooh, this sounds like the perfect opportunity for a custom post-fix operator. Maybe `@a-[ … ]-` (or the opposite, `@a[- … -]` — user0721090601, Aug 20 '19 at 13:48

raiph · Accepted Answer · 2019-08-20T20:59:04.810

10

sub infix:<not-at> ($elems, @not-ats) {
  my $at = 0;
  flat gather for @not-ats -> $not-at {
    when $at < $not-at { take $at++ xx $not-at - $at } 
    NEXT { $at++ }
    LAST { take $at++ xx $elems - $not-at - 1 }
  }
}

my @a = < a b c d e f g >;
say @a[ * not-at (1, 3, 5) ]; # (a c e g)

I think the operator code is self-explanatory if you know each of the P6 constructs it uses. If anyone would appreciate an explanation of it beyond the following, let me know in the comments.

I'll start with the two aspects that generate the call to not-at.

`*` aka `Whatever`

From the Whatever doc page:

When * is used in term position, that is, as an operand, in combination with most operators, the compiler will transform the expression into a closure of type WhateverCode

* is indeed used in the above as an operand. In this case it's the left argument (corresponding to the $elems parameter) of the infix not-at operator that I've just created.

The next question is, will the compiler do the transform? The compiler decides based on whether the operator has an explicit * as the parameter corresponding to the * argument. If I'd written * instead of $elems then that would have made not-at one of the few operators that wants to directly handle the * and do whatever it chooses to do and the compiler would directly call it. But I didn't. I wrote $elems. So the compiler does the transform I'll describe next.

The transform builds a new WhateverCode around the enclosing expression and rewrites the Whatever as "it" aka the topic aka $_ instead. So in this case it turns this:

* not-at (1,3,5)

into this:

{ $_ not-at (1,3,5) }

What `[...]` as a subscript does

The [...] in @a[...] is a Positional (array/list) subscript. This imposes several evaluation aspects, of which two matter here:

"it" aka the topic aka $_ is set to the length of the list/array.
If the content of the subscript is a Callable it gets called. The WhateverCode generated as explained above is indeed a Callable so it gets called.

So this:

@a[ * not-at (1,3,5) ]

becomes this:

@a[ { $_ not-at [1,3,5] } ]

which turns into this:

 @a[ { infix:not-at(7, [1,3,5]) } ]

edited Aug 20 '19 at 20:59

answered Aug 20 '19 at 15:01

raiph

31,607
3
62
111

I still vote for the negative post circumfix, but +1 for all the rest – user0721090601 Aug 20 '19 at 18:04
I tried but failed to get that to work. `sub postcircumfix:<[ ]-> (@positional, *@index) { @positional[infix:not-at(@positional.elems, @index)]` and variations yield a compile-time error. If I could get it to work I'd add it to my answer. But I also felt like I wanted an approach that would work for a wide and extensible range of index selection operations and `* foo-at (indices)` looked to me like it would do the job in an appropriate and nicely readable manner. – raiph Aug 20 '19 at 18:35
raiph: ah, interesting. In any case, your answer very nicely shows the power of the whatever in the `[ ]` circumfix. If it weren't for the existence of `pick` or `roll`, one could easily imagine enabling it via this method. – user0721090601 Aug 20 '19 at 18:55
Personally, I prefer { take $at++ xx ( $not-at - $at ) } . Please can you clarify why cache is needed? – librasteve Aug 20 '19 at 20:42
@p6steve The `cache` was a left over from earlier, worse code. It is unnecessary and I've removed it. Thanks. :) – raiph Aug 20 '19 at 21:41
My take on "superstitious parens" is based on a mix of factors. I find my speed and confidence of code comprehension is improved both for specific code I write and for P6 code overall if I omit most unnecessary parens. (I know that folk differ in this regard and might write the same code differently in different circumstances.) The main factor I recall considering for this particular case was the subtle bug thing. But the code would produce absurd results in almost all cases if the precedence is wrong. I see leaving out the parens as a strong signal to readers that the precedence is correct. – raiph Aug 20 '19 at 21:50
Thank you raiph for the operator and the explanation of the whatever code, which I am still experimenting on !!! – lisprogtor Aug 21 '19 at 06:48
@lisprogtor FYI I'm doubting my explanation now. I'm spelunking the source code to nail down the truth. I plan to update my answer in a short while. – raiph Aug 21 '19 at 10:45

score 9 · Answer 2 · answered Aug 20 '19 at 23:39

Given the indexer wants the elements to extract, we could solve this by turning the list of elements to exclude into a list of ranges of elements to extract. That is, given:

1, 3, 5

We'd produce something equivalent to:

0..0, 2..2, 4..4, 6..Inf

Given:

my @exclude = 1, 3, 5;

We can do:

-1, |@exclude Z^..^ |@exclude, Inf

To break it down, zip (-1, 1, 3, 5) with (1, 3, 5, Inf), but using the range operator with exclusive endpoints. This results in, for the given example:

(-1^..^1 1^..^3 3^..^5 5^..^Inf)

Which is equivalent to the ranges I mentioned above. Then we stick this into the indexer:

my @a = <a b c d e f g>
my @exclude = 1, 3, 5;
say @a[-1, |@exclude Z^..^ |@exclude, Inf].flat

Which gives the desired result:

(a c e g)

This approach is O(n + m). It will probably work out quite well if there array is long, but the number of things to exclude is comparatively small, since it only produces the Range objects needed for the indexing, and then indexing by range is relatively well optimized.

Finally, should the flat on the outside be considered troublesome, it's also possible to move it inside:

@a[{ flat -1, |@exclude Z^..^ |@exclude, $_ }]

Which works because the block is passed the number of elements in @a.

Thank you Jonathan Worthington for shining light from a different angle !!! — lisprogtor, Aug 21 '19 at 06:50

score 7 · Answer 3 · answered Aug 20 '19 at 19:58

7

Here's another option:

my @a = < a b c d e f g >;
say @a[@a.keys.grep(none(1, 3, 5))];

But all in all, arrays aren't optimized for this use case. They are optimized for working with a single element, or all elements, and slices provide a shortcut for (positively) selecting several elements by key.

If you tell us about the underlying use case, maybe we can recommend a more suitable data structure.

answered Aug 20 '19 at 19:58

moritz

12,710
1
41
63

For the underlying use case, I am data-mining huge text files, selecting certain elements of each very long line, moving them to the front, and move the rest to the back of the line. i.e., move @a[1,5,9,1001] to front and move @a[without 1,5,9,1001] to the back of the line; thanks. – lisprogtor Aug 21 '19 at 07:03
@lisprogtor you might want to consider using [splice](https://docs.perl6.org/routine/splice) to remove the elements, and handle them separately. – moritz Aug 22 '19 at 07:16

score 3 · Answer 4 · answered Aug 21 '19 at 11:40

This might be slow for big arrays, but it's logically the closer to what you're looking for:

my @a = <a b c d>;

say (@a ⊖ @a[0,1]).keys; # (c d)

It's basically the same solution you proposed at the beginning, using set difference, except it's using it on the whole array instead of on the indices. Also, in some cases you might use the set directly; it depends on what you want to do.

Sebastian · Answer 5 · 2020-03-27T14:41:08.957

@raiphs solution combined with @Jonathan Worthington 's:

The operator should be very efficient for huge numbers and large @not-ats lists as it returns a list of ranges, and it even creates that list of ranges lazily. For the @not-ats it supports integers and ranges with included and excluded bounds and infinity. But it has to be ascending.

The $elems can be a Range or an Int. It is interpreted as .Int as in Jonathan Worthington's solution to support (but needs a .flat applying it to array slicing - the price of performance for the lazy operator - this could be changed by using flat gather instead of lazy gather in the 2nd line)

@a[ (* not-at (1, 3, 5)).flat ];

or newly support

@a[ (* not-at (1, 3^ .. 5, 8 .. 8, 10, 14 .. ^18, 19 .. *)).flat ];

The performance improvements can be seen, when not slicing the array at once, but operating on parts of the array, optimally with multithreading.

sub infix:<not-at> ($elems, @not-ats) {
    lazy gather {
        my $at = 0;
        for @not-ats {                                               # iterate over @not-ats ranges
            my ($stop, $continue) = do given $_ {
                when Int            { succeed $_,       $_         } # 5
                when !.infinite     { succeed .int-bounds          } # 3..8 | 2^..8 | 3..^9 | 2^..^9
                when !.excludes-min { succeed .min,     $elems.Int } # 4..*
                default             { succeed .min + 1, $elems.Int } # 3^..*
            }
            take $at .. $stop - 1 if $at < $stop;                    # output Range before current $not-at range
            $at = $continue + 1;                                     # continue after current $not-at range
        }
        take $at .. $elems.Int - 1 if $at < $elems;                  # output Range with remaining elements
    }
}

perl6 What is a quick way to de-select array or list elements?

5 Answers5

`*` aka `Whatever`

What `[...]` as a subscript does

Linked

perl6 What is a quick way to de-select array or list elements?

5 Answers5

* aka Whatever

What [...] as a subscript does

Linked

`*` aka `Whatever`

What `[...]` as a subscript does