With three lists, two of which are array coordinates, how do I create an array in python?

Question

I have three lists (really columns in a pandas dataframe) one with data of interest, one with x array coordinates, and one with y array coordinates. All lists are the same length and their order in the list associated with the coordinates (so L1: "Apple" coincides with L2:"1", and L3:"A"). I would like to make an array with the dimensions provided by the two coordinate lists with data from the data list. What is the best way to do this?

The expected output would be in the form of a numpy array or something like:

array = [[0,0,0,3,0,0,2,3][0,0,0,0,0,0,0,3]] #databased on below

Where in this example the array has the dimensions of y = 2 from y.unique() and x = 8 from x.unique().

The following is example input data for what I am talking about:

array_x	array_y	Data
1	a	0
2	a	0
3	a	0
4	a	3
5	a	0
6	a	0
7	a	2
8	a	3
1	b	0
2	b	0
3	b	0
4	b	0
5	b	0
6	b	0
7	b	0
8	b	3

I think you've provided a good explanation, but will you please provide two things to make helping you MUCH easier: 1. a text sample of your dataframe (e.g. `print(df.head().to_dict())`), and based on that, a sample dataframe containing your expected output? Thank you :) — , Dec 14 '21 at 18:17
Instead of a screenshot of data, please provide data as text that can be easily copied/pasted, for example using `to_dict()` as outlined in comments. — BigBen, Dec 14 '21 at 18:24
It's unclear how you get from your dataframe to the desired output. You mention L1:Apple which is nowhere in the image you posted, then array_y has characters instead of numbers, and you have a column Error_type which seems to play a role in building the output array but is not explained. — , Dec 14 '21 at 18:25

score 3 · Accepted Answer · 2021-12-14T19:17:39.733

3

You may be looking for pivot:

out = df.pivot(values=['Data'], columns=['array_y'], index=['array_x']).to_numpy()

Output:

array([[0, 0],
       [0, 0],
       [0, 0],
       [3, 0],
       [0, 0],
       [0, 0],
       [2, 0],
       [3, 3]], dtype=int64)

edited Dec 14 '21 at 19:17

answered Dec 14 '21 at 18:31

2

`df.pivot('array_x', 'array_y', 'Data').to_numpy()` – Scott Boston Dec 14 '21 at 19:11

score 1 · Answer 2 · answered Dec 14 '21 at 18:21

1

Supposing you have a dataframe like that:

import pandas as pd
import numpy as np
myDataframe = pd.DataFrame([[1,2],[3,4],[5,6]], columns=['x','y'])

Then you can select the columns you want and creat an array from it

my_array = np.array(myDataframe[['x','y']])


>>> my_array
array([[1, 2],
       [3, 4],
       [5, 6]], dtype=int64)

answered Dec 14 '21 at 18:21

1

Or better `myDataframe[['x','y']].to_numpy()` (`.values` is deprecated) – Dec 14 '21 at 18:23
How is `myDataframe[['x','y']].to_numpy()` better than using the `np.array()` constructor? – Dec 14 '21 at 18:26
Unfortunately, that is not what I am looking for. The two x and y lists provide coordinates to the position of the third lists (column) in the array. I don't want to make an array of the x and y list but use them to make an array with x and y dimensions filled with z data. – Andrew D Dec 14 '21 at 18:27
@Corralien I never really understood _why_ it is, but check out this post I recently read: https://stackoverflow.com/a/54508052/17242583 – Dec 14 '21 at 18:28

score 1 · Answer 3 · answered Dec 14 '21 at 19:05

1

You could do a zip (note: I'm shorthand-ing some of your example data):

data_x = [1, 2, 3, 4, 5, 6, 7, 8] * 2
data_y = ['a'] * 8 + ['b'] * 8
data_vals = [0,0,0,3,0,0,2,3,0,0,0,0,0,0,0,3]

coll = dict()
for (x, y, val) in zip(data_x, data_y, data_vals):
   if coll.get(y) is None:
     coll[y] = []

   if x > len(coll[y]):
     coll[y].extend([0] * (x - len(coll[y])))

   coll[y][x - 1] = val

result = []
for k in sorted(coll):
    result.append(coll[k])

print coll
print result

Output:

{'a': [0, 0, 0, 3, 0, 0, 2, 3], 'b': [0, 0, 0, 0, 0, 0, 0, 3]}
[[0, 0, 0, 3, 0, 0, 2, 3], [0, 0, 0, 0, 0, 0, 0, 3]]

answered Dec 14 '21 at 19:05

scooter me fecit

1,053
5
15

Alternatively, if you know the max range for x, you can preallocate the lists and avoid the overhead of calling extend(). – scooter me fecit Dec 14 '21 at 19:10
i.e., replace "coll[y] = []" with "coll[y] = [0] * max_x". And delete the "if x > len(coll[y]):" statements. – scooter me fecit Dec 14 '21 at 19:17

With three lists, two of which are array coordinates, how do I create an array in python?

3 Answers3