0

Say I got an array of str:

['12.5', '7', '45', '\n', '13.7', '52', '34.3', '\n']

And I want to split it by value, in this case by '\n', so it becomes:

[['12.5',  '7', '45'],
 ['13.7', '52', '34.3']]

I don't want to enumerate every element since it's time consuming when input has a large scale. So I wonder if there are some functions or python tricks that can easily achieve this.

P.S.

I've saw this question but it doesn't help much. Mainly because I don't quite understand how np.where() works with np.split(), also because I'm working on str type.

Another thing might be helpful is that my final goal is to generate a matrix of numbers (maybe float type), so I'll also be glad to know if there's any numpy function can do this.

Amarth Gûl
  • 1,040
  • 2
  • 14
  • 33
  • Even if you don't want to use a loop to iterate through your elements and you prefer using "some functions or python tricks that can easily achieve this", these tools you are looking for **will** use a loop. So why not use one yourself for such a basic operation ? – IMCoins Jan 22 '18 at 08:36
  • @IMCoins I learned from some courses that many packages are using GPU computing matrices, which is faster than implement by myself with some explicit `for` loop. – Amarth Gûl Jan 22 '18 at 08:38
  • @AmarthGûl Unfortunately, most of the packages that do that are 3rd party packages, and a loop is usually your best bet because it is implemented in C. – cs95 Jan 22 '18 at 08:45
  • @cᴏʟᴅsᴘᴇᴇᴅ Well, when implementing matrix computations, I found `numpy` functions are way more faster than operations written by myself. So I was actually hoping `numpy` could save me again. Now seems you're right, the answers below are still using `for` loops – Amarth Gûl Jan 22 '18 at 08:52

3 Answers3

2

You can use itertools.groupby which, of course, does iterate the list, but is highly optimized:

from itertools import groupby

l = ['12.5', '7', '45', '\n', '13.7', '52', '34.3', '\n']

[list(g) for k, g in groupby(lst, '\n'.__eq__) if not k]
# [['12.5', '7', '45'], ['13.7', '52', '34.3']]

Or, with float conversion:

[list(map(float, g)) for k, g in groupby(lst, '\n'.__eq__) if not k]
# [[12.5, 7.0, 45.0], [13.7, 52.0, 34.3]]
user2390182
  • 72,016
  • 6
  • 67
  • 89
1

Using numpy:

rows = np.split(z, np.where(arr == '\n')[0] + 1)[:-1]
mat = np.array(rows).astype(np.float)

Alternatively, if we're sure to be dealing with a matrix, you could simply search for the first occurrence of '\n', reshape, and slice using that.

first = np.argmax(arr == '\n')
mat = arr.reshape(-1, first + 1)[:, 0:first].astype(np.float)

This might be faster.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
0

I made a thing for this once upon a time. A chunking module. It's made to work similar to str.split

pip install chunking

Then

>>> from chunking import split
>>> a_list = ["foo", 'bar', 'SEP', 'bacon', 'eggs']
>>> split(a_list, 'SEP')
[['foo', 'bar'], ['bacon', 'eggs']]

There's also chunking.iter_split, which is a generator variant of that.

sytech
  • 29,298
  • 3
  • 45
  • 86