0

I would like to take a list of values and transform them to a table (2D-list) of 0's and 1's, with one column for each unique number in the source list and an equal number of rows to the original. Each row will have a 1 if that column index matches the original value-1.

I have code that accomplishes this task, but I'm wondering if there is a better/faster way to do it. (The actual dataset has millions of entries vs. the simplified set below)

Sample Input:

value_list = [1, 2, 1, 3, 6, 5, 4, 3]

Desired output:

output_table = [[1, 0, 0, 0, 0, 0],
                [0, 1, 0, 0, 0, 0],
                [1, 0, 0, 0, 0, 0],
                [0, 0, 1, 0, 0, 0],
                [0, 0, 0, 0, 0, 1],
                [0, 0, 0, 0, 1, 0],
                [0, 0, 0, 1, 0, 0],
                [0, 0, 1, 0, 0, 0]]

Current Solution:

value_list = [1, 2, 1, 3, 6, 5, 4, 3]
max_val = max(value_list)

# initialize to table of 0's
a = [([0] * max_val) for i in range(len(value_list))]

# overwrite with 1's where required
for i in range(len(value_list)):
    j = value_list[i] - 1
    a[i][j] = 1

print(f'a = ')
for row in a:
    print(f'{row}')
KKB
  • 23
  • 4
  • If you're dealing with large amounts of data, it might be worth using NumPy. What kind of data is it? – AMC Jan 23 '20 at 18:37
  • It comes from a text file (I extract that into the 1D list of values in another step). All of the numbers in the source data are integers. – KKB Jan 23 '20 at 18:39
  • 1
    This is basically one-hot encoding. If you can use NumPy your life will be easier. I've marked a duplicate for you to have a look at – rayryeng Jan 23 '20 at 18:47
  • If you're looking for reducing processing (and potentially even memory), it may be possible to subclass `numpy.ndarray` such that the underlying data is just the contents of `value_list`, but returns views that look like `output_table`. – Aaron Jan 23 '20 at 18:52
  • `import numpy as np; a = np.array([1, 2, 1, 3, 6, 5, 4, 3]); b = np.arange(1,a.max()+1); c = 1 * (a[:,None] == b[None,:])` – wwii Jan 23 '20 at 19:20

2 Answers2

1

You can do:

import numpy as np

value_list = [1, 2, 1, 3, 6, 5, 4, 3]

# create matrix of zeros
x = np.zeros(shape=(len(value_list), max(value_list)), dtype='int')

for i,v in enumerate(value_list):
    x[i,v-1] = 1

print(x)

Output:

[[1 0 0 0 0 0]
 [0 1 0 0 0 0]
 [1 0 0 0 0 0]
 [0 0 1 0 0 0]
 [0 0 0 0 0 1]
 [0 0 0 0 1 0]
 [0 0 0 1 0 0]
 [0 0 1 0 0 0]]
Sociopath
  • 13,068
  • 19
  • 47
  • 75
  • 1
    You can make the assignment of all rows simultaneously without a loop by `x[np.arange(len(value_list)), np.array(value_list) - 1] = 1` after you create the initial array `x`. – rayryeng Jan 24 '20 at 00:21
0

You can try this:

dummy_list = [0]*6
output_table = [dummy_list[:i-1] + [1] + dummy_list[i:] for i in value_list]

Output:

output_table = [[1, 0, 0, 0, 0, 0],
                [0, 1, 0, 0, 0, 0],
                [1, 0, 0, 0, 0, 0],
                [0, 0, 1, 0, 0, 0],
                [0, 0, 0, 0, 0, 1],
                [0, 0, 0, 0, 1, 0],
                [0, 0, 0, 1, 0, 0],
                [0, 0, 1, 0, 0, 0]]
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52