8

Is there a "straightforward" way to convert a str containing numbers into a list of [x,y] ints?

# from: '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'
# to: [[5, 4], [2, 4], [1, 0], [3, 0], [5, 1], [3, 3], [14, 32], [3, 5]]

By the way, the following works, but wouldn't call it straightforward... Also, it can be assumed that the input str has been validated to make sure that it only contains an even number of numbers interleaved by commas.

num_str = '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'
numpairs_lst = []      # ends up as [[5, 4], [2, 4], [1, 0], ...]

current_num_str = ''   # the current num within the str; stop when a comma is found
xy_pair = []           # this is one of the [x,y] pairs -> [5, 4] 
for ix,c in enumerate(num_str):
    if c == ',':
        xy_pair.append(int(current_num_str))
        current_num_str = ''
        if len(xy_pair) == 2:
            numpairs_lst.append(xy_pair)
            xy_pair = []
    else:
        current_num_str += c

# and, take care of last number...
xy_pair.append(int(current_num_str))
numpairs_lst.append(xy_pair)
jd.
  • 4,543
  • 7
  • 34
  • 40

11 Answers11

22

There are two important one line idioms in Python that help make this "straightforward".

The first idiom, use zip(). From the Python documents:

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

So applying to your example:

>>> num_str = '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'
>>> zip(*[iter(num_str.split(","))]*2)
[('5', '4'), ('2', '4'), ('1', '0'), ('3', '0'), ('5', '1'), 
('3', '3'), ('14', '32'), ('3', '5')]

That produces tuples each of length 2.

If you want the length of the sub elements to be different:

>>> zip(*[iter(num_str.split(","))]*4)
[('5', '4', '2', '4'), ('1', '0', '3', '0'), ('5', '1', '3', '3'), 
('14', '32', '3', '5')]

The second idiom is list comprehensions. If you want sub elements to be lists, wrap in a comprehension:

>>> [list(t) for t in zip(*[iter(num_str.split(","))]*4)]
[['5', '4', '2', '4'], ['1', '0', '3', '0'], ['5', '1', '3', '3'], 
['14', '32', '3', '5']]
>>> [list(t) for t in zip(*[iter(num_str.split(","))]*2)]
[['5', '4'], ['2', '4'], ['1', '0'], ['3', '0'], ['5', '1'], ['3', '3'], 
['14', '32'], ['3', '5']]

Any sub element groups that are not complete will be truncated by zip(). So if your string is not a multiple of 2, for example, you will loose the last element.

If you want to return sub elements that are not complete (ie, if your num_str is not a multiple of the sub element's length) use a slice idiom:

>>> l=num_str.split(',')
>>> [l[i:i+2] for i in range(0,len(l),2)]
[['5', '4'], ['2', '4'], ['1', '0'], ['3', '0'], ['5', '1'], 
['3', '3'], ['14', '32'], ['3', '5']]
>>> [l[i:i+7] for i in range(0,len(l),7)]
[['5', '4', '2', '4', '1', '0', '3'], ['0', '5', '1', '3', '3', '14', '32'], 
['3', '5']]

If you want each element to be an int, you can apply that prior to the other transforms discussed here:

>>> nums=[int(x) for x in num_str.split(",")]
>>> zip(*[iter(nums)]*2)
# etc etc etc

As pointed out in the comments, with Python 2.4+, you can also replace the list comprehension with a Generator Expression by replacing the [ ] with ( ) as in:

 >>> nums=(int(x) for x in num_str.split(","))
 >>> zip(nums,nums)
 [(5, 4), (2, 4), (1, 0), (3, 0), (5, 1), (3, 3), (14, 32), (3, 5)]
 # or map(list,zip(nums,nums)) for the list of lists version...

If your string is long, and you know that you only need 2 elements, this is more efficient.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • Your first solution is most Pythonic, I think. (except you left out map(int) to convert strings to ints as OP requested) – PaulMcG Feb 22 '11 at 22:13
  • To get tuples of numbers instead of strings, you can use `zip(*[imap(int, num_str.split(","))]*2)` (using `itertools.imap()`). – Sven Marnach Feb 22 '11 at 22:18
  • Regarding the last code snippet: You can use `()` instead of `[]` in the first line and `zip(nums, nums)` as second line. – Sven Marnach Feb 22 '11 at 22:57
  • @Sven Marnach: Thanks! I was so wrapped in explaining I did not notice the substantial improvement that you suggested. – dawg Feb 22 '11 at 23:09
  • 1
    The problem with the generator in this case is that a generator does not have length and is not sub-scriptable; therefor, you cannot use the "slice idiom" that supports partial sub lists. Given that the string is already in memory and the resulting list will be in memory, the generator is more theoretical than practical IMHO. +1 tho, your Python is improving dude! – the wolf Feb 23 '11 at 03:29
  • @Johnsyweb: Help me understand the comment? – dawg Feb 23 '11 at 03:42
  • @carrot-top: Thanks (I think). I still write Perl in Python from time to time, but do find it a very productive language. – dawg Feb 23 '11 at 03:43
  • @drewk: `this` refers to [The Zen of Python](http://www.python.org/dev/peps/pep-0020/). Have it in the back of your mind with every line you write. – johnsyweb Feb 23 '11 at 13:02
  • @Johnsyweb: I know what it refers to. Why is it applicable here? Was there something Zen-like or not so Zen-like about these code snippets? :D – the wolf Feb 23 '11 at 16:24
  • @carrot-top, @drewk: in lines like `[list(t) for t in zip(*[iter(num_str.split(","))]*2)]`, I would say you're breaking "Readability counts", "Simple is better than complex" and "Beautiful is better than ugly". Try to resist doing *everything* on one line [I saw your comment about Perl :-)]. – johnsyweb Feb 23 '11 at 19:26
  • 2
    @Johnsyweb: I actually think that list comprehensions and genexp are one of the truly *beautiful* features of Python -- I love them! Perl has similar constructs; more flexible if you understand them; far *less* readable than the Python equivalent. The thing that is hard for me to get (subjective interpretation of the Zen and a Perl background) is the bias in Python for small helper functions. Rather than everything in 1 line, one must trace through the little functions. Trade-off I guess. Thanks for the comment tho. I learn more every day thanks to helpful comments. :-} – dawg Feb 23 '11 at 20:29
  • 2
    @drewk: I completely agree. The good things about small helper functions are that they're very easy to unit-test and (with meaningful names) they make it easy to read *what* you are doing (not *how* you are doing it). Hence if you search the www for "executable pseudocode", you get lots of hits about Python! – johnsyweb Feb 23 '11 at 21:11
  • That's a keeper, thorough and detailed answer +1 :) – zx81 Jun 11 '14 at 23:58
15

One option:

>>> num_str = '5,4,2,4,1,0,3,0,5,1,3,3,4,3,3,5'
>>> l = num_str.split(',')
>>> zip(l[::2], l[1::2])
[('5', '4'), ('2', '4'), ('1', '0'), ('3', '0'), ('5', '1'), ('3', '3'), ('4', '3'), ('3', '5')]

Reference: str.split(), zip(), General information about sequence types and slicing

If you actually want integers, you could convert the list to integers first using map:

>>> l = map(int, num_str.split(','))

Explanation:

split creates a list of the single elements. The trick is the slicing: the syntax is list[start:end:step]. l[::2] will return every second element starting from the first one (so the first, third,...), whereas the second slice l[1::2] returns every second element from the second one (so the second, forth, ...).

Update: If you really want lists, you could use map again on the result list:

>>> xy_list = map(list, xy_list)

Note that @Johnsyweb's answer is probably faster as it seems to not do any unnecessary iterations. But the actual difference depends of course on the size of the list.

Community
  • 1
  • 1
Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143
  • @A A: What exactly do you want to know? – Felix Kling Feb 22 '11 at 20:07
  • Hi Felix, this worked - thanks. This is minor but, can we make zip() return 2-item lists instead of tuples? – jd. Feb 22 '11 at 20:07
  • @AA When slicing a list, you can specify (up to) three parameters seperated by colons. The first is the start of your split (by default 0), the second is the end of your split (by default the end of the list) and the third is the step (by default 1). So, the ::2 is splitting the list from 0 to the end and taking every other element, while 1::2 is splitting from one to the end taking every other element. – Wilduck Feb 22 '11 at 20:11
  • @jd: Why do you need lists? Generally, the only difference is that lists are mutable - do you need this? –  Feb 22 '11 at 20:11
  • @delnan: Not necessary, but i wanted to leave the possibility open of someone adding an extra element to the 2-elem lists somewhere down the pipeline. – jd. Feb 22 '11 at 20:15
  • @jd `list_of_lists = [list(tuple_pair) for tuple_pair in list_of_tuples]` – Wilduck Feb 22 '11 at 20:16
  • @wilduck thanks - guillermo's answer below is similar - also neat. – jd. Feb 22 '11 at 20:24
11
#!/usr/bin/env python

from itertools import izip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

s = '5,4,2,4,1,0,3,0,5,1,3,3,4,3,3,5'
fields = s.split(',')
print [[int(x), int(y)] for x,y in pairwise(fields)]

Taken from @martineau's answer to my question, which I have found to be very fast.

Output:

[[5, 4], [2, 4], [1, 0], [3, 0], [5, 1], [3, 3], [4, 3], [3, 5]]
Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
3

First, use split to make a list of numbers (as in all of the other answers).

num_list = num_str.split(",")

Then, convert to integers:

num_list = [int(i) for i in num_list]

Then, use the itertools groupby recipe:

from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
   "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
   args = [iter(iterable)] * n
   return izip_longest(fillvalue=fillvalue, *args)

pair_list = grouper(2, num_list)

Of course, you can compress this into a single line if you're frugal:

pair_list = grouper(2, [int(i) for i in num_str.split(",")]
Andrew Jaffe
  • 26,554
  • 4
  • 50
  • 59
2
>>> num_str = '5,4,2,4,1,0,3,0,5,1,3,3,4,3,3,5'
>>> inums = iter([int(x) for x in num_str.split(',')])
>>> [[x, inums.next()] for x in inums]
[[5, 4], [2, 4], [1, 0], [3, 0], [5, 1], [3, 3], [4, 3], [3, 5]]
>>>
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • 1. You can omit the `iter` and the suqare brackets in the second line, and it will still work. 2. `next(inums)` is preferrable over `inums.next()`, since this would make the solution work in Python 3.x as well. 3. If you are fine with tuples instead of lists, the last line can be written `zip(inums, inums)`. – Sven Marnach Feb 22 '11 at 22:09
  • @Sven Marnach: (1) & (2): You are correct, for recent Pythons; my code is often conditioned by supporting a package on 2.1 through 2.7 :-) (3) I'm fine with tuples, but the OP wanted lists. – John Machin Feb 22 '11 at 22:14
  • I also experienced that supporting Python 2.1 seems more important in practice than supporting 3.x :) – Sven Marnach Feb 22 '11 at 22:22
1

EDIT: @drewk cleaned this up to handle even or odd length lists:

>>> f = '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'
>>> li = [int(n) for n in f.split(',')]
>>> [li[i:i+2] for i in range(0, len(li), 2)]
[[5, 4], [2, 4], [1, 0], [3, 0], [5, 1], [3, 3], [14, 32], [3, 5], [7]]
Bluu
  • 5,226
  • 4
  • 29
  • 34
0

This is a more generalized function which works for different chunk sizes and appends the reminder if needed

def breakup(mylist,chunks):
  mod = len(mylist) % chunks
  if mod ==  0:
      ae = []
  elif mod == 1:
      ae = mylist[-1:]
  else:
      ae = [tuple(mylist[-mod:])]
  return zip(*[iter(mylist)]*chunks) + ae

num_str = '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'
lst = map(int,num_str.split(','))
print breakup(lst,2)

OUT: [(5, 4), (2, 4), (1, 0), (3, 0), (5, 1), (3, 3), (14, 32), (3, 5)]

Feczo
  • 608
  • 5
  • 8
0

You can shorten the first part (converting "1,2,3" to [1, 2, 3]) by using the split function:

num_list = num_str.split(",")

There might be an easier way to get pairs, but I'd do something like this:

xy_pairs = []
for i in range(0, len(num_list), 2):
    x = num_list[i]
    y = num_list[i + 1]
    xy_pairs.append([x, y])

Also, since these are all lists of a defined length (2), you should probably use a tuple:

xy_pairs.append((x, y))
Brendan Long
  • 53,280
  • 21
  • 146
  • 188
0

It may be interesting to have a generator. Here's a generator expression:

import re
ch = '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'
genexp = ( map(int,ma.groups()) for ma in re.finditer('(\d+)\s*,\s*(\d+)',ch) )
eyquem
  • 26,771
  • 7
  • 38
  • 46
0
#declare the string of numbers
str_nums = '5,4,2,4,1,0,3,0,5,1,3,3,14,32,3,5'

#zip two lists: the even elements with the odd elements, casting the strings to integers
zip([int(str_nums.split(',')[i]) for i in range(0,len(str_nums.split(',')),2)],[int(str_nums.split(',')[i]) for i in range(1,len(str_nums.split(',')),2)])

"""
Of course you would want to clean this up with some intermediate variables, but one liners like this is why I love Python :)
"""
mik01aj
  • 11,928
  • 15
  • 76
  • 119
-2

Maybe this?

a = "0,1,2,3,4,5,6,7,8,9".split(",")
[[int(a.pop(0)), int(a.pop(0))] for x in range(len(a)/2)]
guillermooo
  • 7,915
  • 15
  • 55
  • 58
  • john, i just saw your answer and seems quite good as well. why do you say that guillermo's answer is bad? to me it seems concise and yet clear. – jd. Feb 22 '11 at 21:56
  • 3
    @jd: `a.pop(0)` is least efficient (N**2 behaviour) and most obscure. – John Machin Feb 22 '11 at 22:09
  • `range(len(a)/2)` Will create a big list of numbers over which to iterate and then discard (in Python < 3). – johnsyweb Feb 22 '11 at 22:26
  • is it really n**2; seems more like O(n); however, agree that it's not the most efficient - no need to create a range and neither to discard elements. drewk's answer illustrates several good features. – jd. Feb 22 '11 at 22:39
  • @jd: Yes, in general `a.pop(i)` has to move each of the pointers for the `a[i+1:]` items down a slot. So `a.pop(0)` is O(N), and it's done N times. – John Machin Feb 23 '11 at 11:14