3

I am trying to convert a vector from A-L to something like this with pandas and numpy built in functions without loops (tile, repeat and reshape). But I cannot wrap my head around

    0   1   2   3   4   5   6   7   8   9   10  11
0   A   A   A   A   E   E   E   E   I   I   I   I
1   B   B   B   B   F   F   F   F   J   J   J   J
2   C   C   C   C   G   G   G   G   K   K   K   K
3   D   D   D   D   H   H   H   H   L   L   L   L
4   A   A   A   A   E   E   E   E   I   I   I   I
5   B   B   B   B   F   F   F   F   J   J   J   J
6   C   C   C   C   G   G   G   G   K   K   K   K
7   D   D   D   D   H   H   H   H   L   L   L   L

Do you have any ideas how I could do that without loops ?

what I have tried so far:

a = np.array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K',  'L'])
b = a.reshape(3,4)

np.repeat(b, 4).reshape(4,12)

gives me:

array([['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
       ['D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'F', 'F', 'F', 'F'],
       ['G', 'G', 'G', 'G', 'H', 'H', 'H', 'H', 'I', 'I', 'I', 'I'],
       ['J', 'J', 'J', 'J', 'K', 'K', 'K', 'K', 'L', 'L', 'L', 'L']],
      dtype='<U1')

EDIT: Some background. Depending on the number of samples and the layout we choose. A machine, creates plates (like in this image). We can do consecutive operations (add more chemicals etc.) and based on the previous layout, unique combinations are obtained. Afterwards the machine measures e.g. concentration in each well and I would like to link the output to the conditions in each well. Because the machine can measure e.g. concentration after each step, a lot of data can be generated and I am trying to find a generic solution without too many loops.

Moritz
  • 5,130
  • 10
  • 40
  • 81
  • `reshape` often returns a view, so takes near-0 time (but see [when-will-numpy-copy-the-array-when-using-reshape](https://stackoverflow.com/questions/36995289/when-will-numpy-copy-the-array-when-using-reshape) – denis Nov 15 '20 at 14:51

2 Answers2

3

You could use:

>>> import numpy as np
>>> x = np.array(list('abcdefghijkl'.upper()))  # your "vector"
>>> np.repeat(np.tile(x.reshape(-1, 4), 2).T, 4, axis=1)
array([['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
       ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
       ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
       ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L'],
       ['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
       ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
       ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
       ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L']],
      dtype='<U1')

It first reshapes it so that you have 4 characters in each column, then duplicates them. Then you transpose it so you have the correct rows/columns and finally you just repeat every character 4 times.

Step-by-step it looks like this:

>>> import pandas as pd
>>> x
array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'],
      dtype='<U1')
>>> x.reshape(-1, 4)
array([['A', 'B', 'C', 'D'],
       ['E', 'F', 'G', 'H'],
       ['I', 'J', 'K', 'L']],
      dtype='<U1')
>>> np.tile(_, 2)
array([['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
       ['E', 'F', 'G', 'H', 'E', 'F', 'G', 'H'],
       ['I', 'J', 'K', 'L', 'I', 'J', 'K', 'L']],
      dtype='<U1')
>>> _.T
array([['A', 'E', 'I'],
       ['B', 'F', 'J'],
       ['C', 'G', 'K'],
       ['D', 'H', 'L'],
       ['A', 'E', 'I'],
       ['B', 'F', 'J'],
       ['C', 'G', 'K'],
       ['D', 'H', 'L']],
      dtype='<U1')
>>> np.repeat(_, 4, axis=1)
array([['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
       ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
       ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
       ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L'],
       ['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
       ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
       ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
       ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L']],
      dtype='<U1')
>>> pd.DataFrame(_)
   0  1  2  3  4  5  6  7  8  9  10 11
0  A  A  A  A  E  E  E  E  I  I   I  I
1  B  B  B  B  F  F  F  F  J  J   J  J
2  C  C  C  C  G  G  G  G  K  K   K  K
3  D  D  D  D  H  H  H  H  L  L   L  L
4  A  A  A  A  E  E  E  E  I  I   I  I
5  B  B  B  B  F  F  F  F  J  J   J  J
6  C  C  C  C  G  G  G  G  K  K   K  K
7  D  D  D  D  H  H  H  H  L  L   L  L
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • Thank you for the answer and sorry for my sloppy question. I made a mistake while uploading – Moritz Sep 21 '17 at 14:07
  • Wow, that is a really comprehensive answer. I will try to understand it without copy and paste the code. – Moritz Sep 21 '17 at 14:16
2
a = np.array(list("ABCDEFGHIJKL"))

a
# array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L'], 
#       dtype='<U1')

np.repeat(np.tile(a.reshape(3,4), 2).T, 4, axis=1)
#array([['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
#       ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
#       ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
#       ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L'],
#       ['A', 'A', 'A', 'A', 'E', 'E', 'E', 'E', 'I', 'I', 'I', 'I'],
#       ['B', 'B', 'B', 'B', 'F', 'F', 'F', 'F', 'J', 'J', 'J', 'J'],
#       ['C', 'C', 'C', 'C', 'G', 'G', 'G', 'G', 'K', 'K', 'K', 'K'],
#       ['D', 'D', 'D', 'D', 'H', 'H', 'H', 'H', 'L', 'L', 'L', 'L']], 
#      dtype='<U1')
Psidom
  • 209,562
  • 33
  • 339
  • 356