1

How can I simplify this function that converts strings of slices for PyTorch / NumPy to slice list objects that can then be used to slice arrays & tensors?

The code below works, but it seems rather inefficient in terms of how many lines it takes.

def str_to_slice_indices(slicing_str: str):
    # Convert indices to lists
    indices = [
        [i if i else None for i in indice_set.strip().split(":")]
        for indice_set in slicing_str.strip("[]").split(",")
    ]

    # Handle Ellipsis "..."
    indices = [
        ... if index_slice == ["..."] else index_slice for index_slice in indices
    ]
    # Handle "None" values
    indices = [
        None if index_slice == ["None"] else index_slice for index_slice in indices
    ]
    # Handle single number values
    indices = [
        int(index_slice[0])
        if isinstance(index_slice, list)
        and len(index_slice) == 1
        and index_slice[0].lstrip("-").isdigit()
        else index_slice
        for index_slice in indices
    ]

    # Create indice slicing list
    indices = [
        slice(*[int(i) if i and i.lstrip("-").isdigit() else None for i in index_slice])
        if isinstance(index_slice, list)
        else index_slice
        for index_slice in indices
    ]
    return indices

Running the above function with an example covering the various types of inputs, give this:

out = str_to_slice_indices("[None, :1, 3:4, 2, :, 2:, ...]")
print(out)

# out:
# [None, slice(None, 1, None), slice(3, 4, None), 2, slice(None, None, None), slice(2, None, None), Ellipsis]
ProGamerGov
  • 870
  • 1
  • 10
  • 23
  • 1
    `eval(f'np.s_{"[None, :1, 3:4, 2, :, 2:, ...]"}')` to `eval` the string `np.s_[None, :1, 3:4, 2, :, 2:, ...]` – Michael Szczesny Apr 02 '22 at 15:38
  • @MichaelSzczesny I that that it was bad idea to use eval as it's a massive security risk? Also, I'd like to avoid using NumPy functions for this. – ProGamerGov Apr 02 '22 at 15:45

2 Answers2

2

Iterating multiple times is not necessary. The sample string has been slightly expanded to test more cases.

def str2slices(s):
    d = {True: lambda e: slice(*[int(i) if i else None for i in e.split(':')]),
        'None': lambda e: None,
        '...': lambda e: ...}
    return [d.get(':' in e or e.strip(), lambda e: int(e))(e.strip()) for e in s[1:-1].split(',')]

str2slices('[None, :1, 3:4, 2, :, -10: ,::,:4:2, 1:10:2, -32,...]')

Output

[None,
 slice(None, 1, None),
 slice(3, 4, None),
 2,
 slice(None, None, None),
 slice(-10, None, None),
 slice(None, None, None),
 slice(None, 4, 2),
 slice(1, 10, 2),
 -32,
 Ellipsis]

The same errors as in OP's solution are caught. They don't silently change the result, but throw a ValueError for unsupported input.


Breakdown of the solution

Assuming string slicing and the split function are known.

With example

s = '[None, :1, 3:4, 2, :, -10: ,::,:4:2, 1:10:2, -32,...]'

we can find slices with

[':' in e for e in s[1:-1].split(',')]
#[False, True, True, False, True, True, True, True, True, False, False]

Using or short-circutting we can distinguish other cases

[':' in e or e.strip() for e in s[1:-1].split(',')]
#['None', True, True, '2', True, True, True, True, True, '-32', '...']

This values can be used as keys of a dictionary

d = {True: 'slice', 'None': None, '...': ...}
[d[':' in e or e.strip()] for e in s[1:-1].split(',')]
#KeyError: '2'

To prevent the KeyError we can use the get method with a default value.

d = {True: 'slice', 'None': None, '...': ...}
[d.get(':' in e or e.strip(), 'number') for e in s[1:-1].split(',')]
#[None, 'slice', 'slice', 'number', 'slice', 'slice', 'slice', 'slice', 'slice', 'number', Ellipsis]

In order to process slices, we need to parse additional values ​​at runtime. So we use lambdas as dictionary values ​​to be able to call them with (e.strip()). Finally, we convert values ​​to int if necessary.

d = {True: lambda e: slice(*[int(i) if i else None for i in e.split(':')]),
    'None': lambda e: None,
    '...': lambda e: ...}
[d.get(':' in e or e.strip(), lambda e: int(e))(e.strip()) for e in s[1:-1].split(',')]

Output

[None,
 slice(None, 1, None),
 slice(3, 4, None),
 2,
 slice(None, None, None),
 slice(-10, None, None),
 slice(None, None, None),
 slice(None, 4, 2),
 slice(1, 10, 2),
 -32,
 Ellipsis]
Michael Szczesny
  • 4,911
  • 5
  • 15
  • 32
0

@Michael suggested using eval on a np.s_.

Another way to demonstrate this is to define a simple class that just accepts a getitem tuple:

In [83]: class Foo():
    ...:     def __getitem__(self, arg):
    ...:         print(arg)
    ...: 
In [84]: Foo()[None, :1, 3:4, 2, :, 2:, ...]
(None, slice(None, 1, None), slice(3, 4, None), 2, slice(None, None, None), slice(2, None, None), Ellipsis)

In normal Python usage, it's the interpreter that converts the ':::' kinds of strings into slice (and related objects). And it only does so within indexing expressions. Effectively your code tries to replicate the work that the interpreter normally does.

I haven't enough attention to the eval security issues to know what you have to add. It seems that the indexing syntax is pretty restrictive as it is.

It looks like strings that don't fit the slice and ellipsis syntax are passed through unchanged and unevaluated.

In [90]: Foo()['if x is 1:print(x)']
if x is 1:print(x)

My Foo and np.s_ don't try to evaluate the tuple that __getitem__ passes to them. np.s_ is nearly as simple (the code is to find and read).

Normally ast.literal_eval is used as a 'safer' alternative to eval, but it only handles strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None

hpaulj
  • 221,503
  • 14
  • 230
  • 353