4

I have three lists (really columns in a pandas dataframe) one with data of interest, one with x array coordinates, and one with y array coordinates. All lists are the same length and their order in the list associated with the coordinates (so L1: "Apple" coincides with L2:"1", and L3:"A"). I would like to make an array with the dimensions provided by the two coordinate lists with data from the data list. What is the best way to do this?

The expected output would be in the form of a numpy array or something like:

array = [[0,0,0,3,0,0,2,3][0,0,0,0,0,0,0,3]] #databased on below

Where in this example the array has the dimensions of y = 2 from y.unique() and x = 8 from x.unique().

The following is example input data for what I am talking about:

array_x array_y Data
1 a 0
2 a 0
3 a 0
4 a 3
5 a 0
6 a 0
7 a 2
8 a 3
1 b 0
2 b 0
3 b 0
4 b 0
5 b 0
6 b 0
7 b 0
8 b 3
Andrew D
  • 73
  • 1
  • 6
  • 3
    Update a sample as plain text please. – Corralien Dec 14 '21 at 18:17
  • 2
    I think you've provided a good explanation, but will you please provide two things to make helping you MUCH easier: 1. a text sample of your dataframe (e.g. `print(df.head().to_dict())`), and based on that, a sample dataframe containing your expected output? Thank you :) –  Dec 14 '21 at 18:17
  • Good point. I have added that information. – Andrew D Dec 14 '21 at 18:24
  • 1
    Instead of a screenshot of data, please provide data as text that can be easily copied/pasted, for example using `to_dict()` as outlined in comments. – BigBen Dec 14 '21 at 18:24
  • 2
    It's unclear how you get from your dataframe to the desired output. You mention L1:Apple which is nowhere in the image you posted, then array_y has characters instead of numbers, and you have a column Error_type which seems to play a role in building the output array but is not explained. –  Dec 14 '21 at 18:25
  • I have updated and clarified. – Andrew D Dec 14 '21 at 18:38

3 Answers3

3

You may be looking for pivot:

out = df.pivot(values=['Data'], columns=['array_y'], index=['array_x']).to_numpy()

Output:

array([[0, 0],
       [0, 0],
       [0, 0],
       [3, 0],
       [0, 0],
       [0, 0],
       [2, 0],
       [3, 3]], dtype=int64)
1

Supposing you have a dataframe like that:

import pandas as pd
import numpy as np
myDataframe = pd.DataFrame([[1,2],[3,4],[5,6]], columns=['x','y'])

Then you can select the columns you want and creat an array from it

my_array = np.array(myDataframe[['x','y']])


>>> my_array
array([[1, 2],
       [3, 4],
       [5, 6]], dtype=int64)
  • 1
    Or better `myDataframe[['x','y']].to_numpy()` (`.values` is deprecated) –  Dec 14 '21 at 18:23
  • How is `myDataframe[['x','y']].to_numpy()` better than using the `np.array()` constructor? –  Dec 14 '21 at 18:26
  • Unfortunately, that is not what I am looking for. The two x and y lists provide coordinates to the position of the third lists (column) in the array. I don't want to make an array of the x and y list but use them to make an array with x and y dimensions filled with z data. – Andrew D Dec 14 '21 at 18:27
  • @Corralien I never really understood _why_ it is, but check out this post I recently read: https://stackoverflow.com/a/54508052/17242583 –  Dec 14 '21 at 18:28
1

You could do a zip (note: I'm shorthand-ing some of your example data):

data_x = [1, 2, 3, 4, 5, 6, 7, 8] * 2
data_y = ['a'] * 8 + ['b'] * 8
data_vals = [0,0,0,3,0,0,2,3,0,0,0,0,0,0,0,3]

coll = dict()
for (x, y, val) in zip(data_x, data_y, data_vals):
   if coll.get(y) is None:
     coll[y] = []

   if x > len(coll[y]):
     coll[y].extend([0] * (x - len(coll[y])))

   coll[y][x - 1] = val

result = []
for k in sorted(coll):
    result.append(coll[k])

print coll
print result

Output:

{'a': [0, 0, 0, 3, 0, 0, 2, 3], 'b': [0, 0, 0, 0, 0, 0, 0, 3]}
[[0, 0, 0, 3, 0, 0, 2, 3], [0, 0, 0, 0, 0, 0, 0, 3]]
scooter me fecit
  • 1,053
  • 5
  • 15