0

How can I split a 2D array by a grouping variable, and return a list of arrays please (also the order is important).

To show expected outcome, the equivalent in R can be done as

> (A = matrix(c("a", "b", "a", "c", "b", "d"), nr=3, byrow=TRUE)) # input
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 
[3,] "b"  "d" 
> (split.data.frame(A, A[,1])) # output
$a
     [,1] [,2]
[1,] "a"  "b" 
[2,] "a"  "c" 

$b
     [,1] [,2]
[1,] "b"  "d" 

EDIT: To clarify: I'd like to split the array/matrix, A into a list of multiple arrays based on the unique values in the first column. That is, split A into one array where the first column has an a, and another array where the first column has a b.

I have tried Python equivalent of R "split"-function but this gives three arrays

import numpy as np
import itertools
A = np.array([["a", "b"], ["a", "c"], ["b", "d"]])
b = a[:,0]

def split(x, f):
     return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
split(A, b) 

([array(['a', 'b'], dtype='<U1'),
  array(['a', 'c'], dtype='<U1'),
  array(['b', 'd'], dtype='<U1')],
 [])

And also numpy.split, using np.split(A, b), but which needs integers. I though I may be able to use How to convert strings into integers in Python? to convert the letters to integers, but even if I pass integers, it doesn't split as expected

c = np.transpose(np.array([1,1,2]))
np.split(A, c) # returns 4 arrays

Can this be done? thanks

EDIT: please note that this is a small example, and the number of groups may be greater than two and they may not be ordered.

user2957945
  • 2,353
  • 2
  • 21
  • 40

2 Answers2

2

You can use pandas:

import pandas as pd
import numpy as np

a = np.array([["a", "b"], ["a", "c"], ["b", "d"]])

listofdfs = {}
for n,g in pd.DataFrame(a).groupby(0):
    listofdfs[n] = g

listofdfs['a'].values

Output:

array([['a', 'b'],
       ['a', 'c']], dtype=object)

And,

listofdfs['b'].values

Output:

array([['b', 'd']], dtype=object)

Or, you could use itertools groupby:

import numpy as np
from itertools import groupby
l = [np.stack(list(g)) for k, g in groupby(a, lambda x: x[0])]

l[0]

Output:

array([['a', 'b'],
       ['a', 'c']], dtype='<U1')

And,

l[1]

Output:

array([['b', 'd']], dtype='<U1')
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Great, thanks Scott, that looks good. I'd considered coercing to a dataframe but I thought there may be array tools -- but this is good. – user2957945 Nov 14 '18 at 19:14
  • brilliant, thank you very much. I'm trapesing through https://stackoverflow.com/questions/773/how-do-i-use-pythons-itertools-groupby, so your edit gives me the output for my understanding to work towards – user2957945 Nov 14 '18 at 19:30
0

If I understand your question, you can do simple slicing, as in:

a = np.array([["a", "b"], ["a", "c"], ["b", "d"]])

x,y=a[:2,:],a[2,:]

x
array([['a', 'b'],
       ['a', 'c']], dtype='<U1')

y
array(['b', 'd'], dtype='<U1')
G. Anderson
  • 5,815
  • 2
  • 14
  • 21
  • Hi G.Anderson, thank you for your answer. This would fail for `a = np.array([["a", "b"], ["b", "d"], ["a", "c"], ["b", "d"]])`, or if there were more groups. Apologies maybe my example was to minimal. – user2957945 Nov 14 '18 at 18:55
  • I see. I answered before you edited about grouping the splits based on value. Perhaps [this answer](https://stackoverflow.com/questions/38277182/splitting-numpy-array-based-on-value) might help? – G. Anderson Nov 14 '18 at 19:07
  • Thanks. That looks promising-- I'm just trying to tweak it to my example. – user2957945 Nov 14 '18 at 19:13