4

I am coming from an R background an trying to figure out a way to access a number of elements from a list given the index. A simple example is below:

my_list = ["a", "b", "c", "d", "e", "f", "g"]
my_elements = itemgetter(*[1,2])(list(my_list))
my_elements

This will return the first and second elements -- great! But I run into problems when I want to specify a sequence of integers to pull. The R implementation of what I would be doing would be:

my_list = c("a", "b", "c", "d", "e", "f", "g")
my_elements = my_list[c(1,3:5)]
my_elements

How would I do the equivalent in python? I have tried something like:

my_elements = itemgetter(*[1, list(range(3,6))])(list(my_list))

But I have to concert the range object and then it adds a list of numbers to the list rather the sequence of numbers directly. I am new to Python but I feel like there must be a very simple way of doing this I am overlooking?

M--
  • 25,431
  • 8
  • 61
  • 93
niccalis
  • 134
  • 1
  • 7
  • I do not understand what you mean by "But I have to concert the range object and then it adds a list of numbers to the list rather the sequence of numbers directly. ". Note, you are doing a lot of weird stuff. For example, `itemgetter(*[1,2])(list(my_list))` can just be written `itemgetter(1,2)(my_list)`, similarly `itemgetter(*[1, list(range(3,6))])(list(my_list))` can just be `itemgetter(1, *range(3,6))(my_list)` – juanpa.arrivillaga Dec 05 '19 at 21:46
  • Does this answer your question? [Understanding slice notation](https://stackoverflow.com/questions/509211/understanding-slice-notation) – mkrieger1 Dec 05 '19 at 21:49

4 Answers4

5

Note, it may be overkill, but if you are coming from R, you may consider the numpy/pandas libraries for the sort of functionality you would be used to, so, using a numpy.ndarray instead of a list object, you can use:

>>> import numpy as np
>>> arr = np.array(["a", "b", "c", "d", "e", "f", "g"])
>>> arr[np.r_[1, 3:6]]
array(['b', 'd', 'e', 'f'],
      dtype='<U1')

The indexing for numpy/pandas data structures will be more familiar to an R user. Python is not a domain-specific, statistical programming language, it is general purpose, so this sort of fancy-indexing isn't built-in.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
3

Examples of basic indexing and slicing:

my_list = ["a", "b", "c", "d", "e", "f", "g"]

print(my_list[1])  # indexing: get second item
print(my_list[:4:2]  # slicing: get every second item for items 1-4

# getting several items from different positions
my_list[1:2] + my_list[4:6]  # list concatenation

Actually, this explanation is very nice: Understanding slice notation

Examples of custom slicing:

from operator import itemgetter

itemgetter(2, 5, 3)(my_list)

lst_ids = [2,5,3]
getter = itemgetter(*lst_ids)
new_list = list(getter(my_list))
sammy
  • 857
  • 5
  • 13
  • 1
    The question is about finding an equivalent to R's `my_list[c(1,3:5)]`, which returns a list containing the items with indices 1, 3, 4, 5(?). It accepts a mix of any number of indices and slices. – Thierry Lathuille Dec 05 '19 at 21:51
  • Is there a way to convert the above logic into a single statement? My issue is I will be having a lot of combinations of slices and individual indices so I would like to avoid repeatedly adding lists. – niccalis Dec 05 '19 at 21:51
  • @niccalis It might be worth it to describe your program a bit more, provide some context. – AMC Dec 05 '19 at 22:19
1

Good question! The R syntax, if taken as Pythion pseudocode in Python, would mean "take the element in my_list indexed by a tuple whose elements are an integer and a slice." Unfortunately is is regarded as syntactically incorrect, since slices are only allowed in very specific contexts in Python. We should therefore perhaps look for some way to achieve the same ends in the existing language.

About the best I've come up with as I wait for the evening news to start is a function that takes a string argument which, if composed of the syntax you quote, should serve.

def select(lst, indices):
    indices = indices.split(",")
    for i_string in indices:
        if ":" in i_string:
            s, e = (int(x) for x in i_string.split(":"))
            for i in range(s, e):
                yield lst[i]
        else:
            yield lst[int(i_string)]


print([x for x in select(['a', 'b', 'c', 'd', 'e', 'f'],
                   "0, 2:4, 5")])

I have taken the liberty of retaining Python indexing conventions, since to do otherwise would be something of a perversion of the language. As a result the code prints ['a', 'c', 'd', 'f'], which I hope is explicable if not satisfactory.

It would. of course, be possible to define a class that inherited from list and install something similar as its __getitem__ method for string indices. It would, however, then need to be modified to delegate non-string indices to list.__getitem__, a fairly simple adaptation. This would lose the somewhat ugly need to extract the elements in a comprehension.

I'm aware that the speed of this technique won't be anything like that of "native" Python code implemented in C, but it could be improved somewhat by implementing the same facilities as a compiled extension.

There may also be features in numpy that could help, given that it's possible to select from a numpy array using a conformant array of Booleans. Others may know that ecosystem well enough to make better suggestions.

holdenweb
  • 33,305
  • 7
  • 57
  • 77
  • Naturally this code would need upgrading for "production" use to trap and handle exceptions relating to (e.g.) data format and list bounds errors. My assumption was this feature would mostly be used for data preparation, and that speed would not be a requirement. – holdenweb Dec 06 '19 at 10:47
1

You can use the same syntax as in R's c, at the expense of an additional function call and a slight change in syntax, like this:

my_elements = slices(my_list, c[1, 3:5, 2:4, 9])

The trick is to use a c object having a __getitem__ method. We can put anything we want between the brackets [], and it will be passed to __getitem__.

If we pass a mix of indices and slices, we'll get a tuple of integers and slice objects.

From there, our special __getitem__ will return a list of single indices that the slices function can use to extract the corresponding items from our list.

class C:
    def __getitem__(self, idx_and_slices):
        if not isinstance(idx_and_slices, tuple):
            idx_and_slices = (idx_and_slices,)
        indices = []
        for x in idx_and_slices:
            if isinstance(x, int):
                indices.append(x)
            elif isinstance(x, slice):
                indices.extend(range(x.start, x.stop))
        return indices

c = C()

def slices(lst, indices):
    return [lst[i] for i in indices]

Usage:

my_list = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]

my_elements = slices(my_list, c[1, 3:5, 2:4, 9])
print(my_elements)
# ['b', 'd', 'e', 'c', 'd', 'j']

More fun!

We could use an even "stranger", but shorter syntax, by also passing our list in the brackets, as in:

my_elements = d[my_list, 1, 3:5, 2:4, 9]

The __getitem__ method of d will get a tuple with the list as first item, followed by the indices and slices, and will return the slice directly

class D:
    def __getitem__(self, lst_idx_and_slices):
        lst = lst_idx_and_slices[0]
        idx_and_slices = lst_idx_and_slices[1:]
        out = []
        for x in idx_and_slices:
            if isinstance(x, int):
                out.append(lst[x])
            elif isinstance(x, slice):
                out.extend(lst[x.start:x.stop:x.step])
        return out

d = D()

We would use it like this:

my_list = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]

my_elements = d[my_list, 1, 3:5, 2:4, 9]
print(my_elements)
# ['b', 'd', 'e', 'c', 'd', 'j']
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50