How to split an string type array by value

Question

Say I got an array of str:

['12.5', '7', '45', '\n', '13.7', '52', '34.3', '\n']

And I want to split it by value, in this case by '\n', so it becomes:

[['12.5',  '7', '45'],
 ['13.7', '52', '34.3']]

I don't want to enumerate every element since it's time consuming when input has a large scale. So I wonder if there are some functions or python tricks that can easily achieve this.

P.S.

I've saw this question but it doesn't help much. Mainly because I don't quite understand how np.where() works with np.split(), also because I'm working on str type.

Another thing might be helpful is that my final goal is to generate a matrix of numbers (maybe float type), so I'll also be glad to know if there's any numpy function can do this.

Even if you don't want to use a loop to iterate through your elements and you prefer using "some functions or python tricks that can easily achieve this", these tools you are looking for **will** use a loop. So why not use one yourself for such a basic operation ? — IMCoins, Jan 22 '18 at 08:36
@IMCoins I learned from some courses that many packages are using GPU computing matrices, which is faster than implement by myself with some explicit `for` loop. — Amarth Gûl, Jan 22 '18 at 08:38
@AmarthGûl Unfortunately, most of the packages that do that are 3rd party packages, and a loop is usually your best bet because it is implemented in C. — cs95, Jan 22 '18 at 08:45
@cᴏʟᴅsᴘᴇᴇᴅ Well, when implementing matrix computations, I found `numpy` functions are way more faster than operations written by myself. So I was actually hoping `numpy` could save me again. Now seems you're right, the answers below are still using `for` loops — Amarth Gûl, Jan 22 '18 at 08:52

user2390182 · Accepted Answer · 2018-01-22T08:49:16.573

2

You can use itertools.groupby which, of course, does iterate the list, but is highly optimized:

from itertools import groupby

l = ['12.5', '7', '45', '\n', '13.7', '52', '34.3', '\n']

[list(g) for k, g in groupby(lst, '\n'.__eq__) if not k]
# [['12.5', '7', '45'], ['13.7', '52', '34.3']]

Or, with float conversion:

[list(map(float, g)) for k, g in groupby(lst, '\n'.__eq__) if not k]
# [[12.5, 7.0, 45.0], [13.7, 52.0, 34.3]]

edited Jan 22 '18 at 08:49

answered Jan 22 '18 at 08:34

user2390182

72,016
6
67
89

Alternatively, one might also use `pandas` for similar functionality. – Mateen Ulhaq Jan 22 '18 at 08:36
Or `[list(g) for k, g in groupby(lst, '\n'.__eq__) if not k]` – Mazdak Jan 22 '18 at 08:48
@Kasramvd Very good point. Updated my answer. Mayby slightly less obvious to the beginner's eye, but definitely worth avoiding the lambda. – user2390182 Jan 22 '18 at 08:50

Mateen Ulhaq · Answer 2 · 2018-01-22T09:01:58.967

1

Using numpy:

rows = np.split(z, np.where(arr == '\n')[0] + 1)[:-1]
mat = np.array(rows).astype(np.float)

Alternatively, if we're sure to be dealing with a matrix, you could simply search for the first occurrence of '\n', reshape, and slice using that.

first = np.argmax(arr == '\n')
mat = arr.reshape(-1, first + 1)[:, 0:first].astype(np.float)

This might be faster.

edited Jan 22 '18 at 09:01

answered Jan 22 '18 at 08:56

Mateen Ulhaq

24,552
19
101
135

score 0 · Answer 3 · answered Jan 22 '18 at 08:37

I made a thing for this once upon a time. A chunking module. It's made to work similar to str.split

pip install chunking

Then

>>> from chunking import split
>>> a_list = ["foo", 'bar', 'SEP', 'bacon', 'eggs']
>>> split(a_list, 'SEP')
[['foo', 'bar'], ['bacon', 'eggs']]

There's also chunking.iter_split, which is a generator variant of that.

How to split an string type array by value

3 Answers3