1

I am currently using a for loop with enumerate to extract from a list of tuples below:

[(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]

What i want is to end up with the following tuple ('handle', 'Firstname', 'Surname', 'Callname', 'Gender')

What would be the most efficient way of accomplishing this without enumerating through them and creating a new tuple or is this the only way?

Chris James
  • 151
  • 2
  • 14

3 Answers3

5

Create a new tuple by enumerating through them:

tuple(t[1] for t in inputlist)

This uses a generator expression to pass each second element from the tuples in inputlist to the tuple() constructor.

If you just need a sequence and a list would do, then use a list comprehension:

[t[1] for t in inputlist]

Lists fit arbitrary-length, ordered, homogenous data sets (such as you have here) better than do tuples, see What's the difference between lists and tuples?

If raw speed is required and readability can be de-emphasised, use map() and operator.itemgetter() to move iteration and extraction to optimised C code:

from operator import itemgetter

labels_tup = tuple(map(itemgetter(1), inputlist))
labels_list = list(map(itemgetter(1), inputlist))

However, I'd avoid doing this unless extracting a bunch of strings out of a list of tuples is on a critical path and / or repeated a lot. Readability counts!

without enumerating through them and creating a new tuple

You can't avoid this. You a) want one element from each tuple in a sequence, and b) need a tuple object as output, an immutable type. While you could write 5 separate statements indexing into inputlist to access each value, doing so would not be efficient, creates needlessly repeated code, and would break the moment your input doesn't have exactly 5 elements.

Demo:

>>> inputlist = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]
>>> tuple(t[1] for t in inputlist)
('handle', 'Firstname', 'Surname', 'Callname', 'Gender')
>>> [t[1] for t in inputlist]
['handle', 'Firstname', 'Surname', 'Callname', 'Gender']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Interestingly enough we came to the same conclusion so it's good my thoughts have been validated – jamylak Mar 01 '20 at 14:04
  • 1
    @jamylak: in this case, it's faster. Not in all cases, however, for a [recent codereview answer](https://codereview.stackexchange.com/a/237906) I could at best squeeze a few percentage points out of itertools (not added to that answer, it wasn't worth the trouble). – Martijn Pieters Mar 01 '20 at 14:09
3

You are looking for generator expression.

print(tuple(i[1] for i in inputlist))

Or

t = tuple(i[1] for i in inputlist)
print(t)

Outputs:

('handle', 'Firstname', 'Surname', 'Callname', 'Gender')

A possible solution with for loop (Not recommended):

li = []
for i in inputlist:
    li.append(i[1])
print(tuple(li))

What would be the most efficient way of accomplishing this without enumerating through them and creating a new tuple or is this the only way?

I am not sure why you want to avoid creating tuple but you don't need enumerate. May be the example given below can help:

def getElement(ndx):
    return inputlist[ndx][1]

# Get Second Element
print(getElement(2))
abhiarora
  • 9,743
  • 5
  • 32
  • 57
2
>>> from operator import itemgetter
>>> data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]
>>> tuple(map(itemgetter(1), data))
('handle', 'Firstname', 'Surname', 'Callname', 'Gender')

This seems to be the fastest in raw speed (only slightly however - since it keeps everything in C as much as possible), and I also do like the look of this as well. Of course you are still looping through the elements however.

Timings:

$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]; from operator import itemgetter;" "tuple(map(itemgetter(1), data))"
500000 loops, best of 5: 477 nsec per loop
$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]; from operator import itemgetter;" "tuple(t[1] for t in data)"
500000 loops, best of 5: 566 nsec per loop
$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]*1000; from operator import itemgetter;" "tuple(map(itemgetter(1), data))"
2000 loops, best of 5: 146 usec per loop
$ python3 -m timeit -s "data = [(0, 'handle', 'VARCHAR(50)', 1, None, 1), (1, 'Firstname', 'TEXT', 1, None, 0), (2, 'Surname', 'TEXT', 1, None, 0), (3, 'Callname', 'TEXT', 1, None, 0), (4, 'Gender', 'INTEGER', 1, None, 0)]*1000; from operator import itemgetter;" "tuple(t[1] for t in data)"
1000 loops, best of 5: 212 usec per loop
jamylak
  • 128,818
  • 30
  • 231
  • 230