0

I'm working on an array called numbers which will be created with 4 columns called (x), (y), (z) respectively and the fourth is used in the program.

I want that if the x and y values of two rows coincide, then based on their c, one of them would be deleted from the main array (a "0" z value removes "1", a "1" z value removes "2" and a "2" z value removes "0").

The original array looks like:

[[12 15  2  0]
 [65 23  0  0]
 [24 66  2  0]
 [65 23  1  0]
 [24 66  0  0]]

The problem is that when I try to run the following program I do not get the required array at the end. The expected output array would look like:

[[12 15  2  0]
 [65 23  0  0]
 [24 66  2  0]]

I have given an extract from the program below

import numpy as np

#Array
numbers = np.array([[12,15,2,0],[65,23,0,0],[24,66,2,0],[65,23,1,0],[24,66,0,0]])

#Original Array
print(numbers)

#Lists to store x, y and z values
xs = []
ys = []
zs = []

#Any removed row is added into this list
removed = []

#Code to delete a row
for line1 in numbers:
    for line2 in numbers:
        if line1[0] == line2[0]:
            if line2[1] == line2[1]:
                if line1[2] == 1 and line2[2] == 0:    
                    removed.append(line1)
                if line1[2] == 0 and line2[2] == 2:    
                    removed.append(line1)
                if line1[2] == 2 and line2[2] == 1:    
                    removed.append(line1)

for i in removed:
    numbers = np.delete(numbers,i,axis=0)

for line in numbers:                        
    xs.append(line[0])
    ys.append(line[1])
    zs.append(line[2])

#Update the original Array
for i in removed:
    print(removed)

print()
print("x\n", xs)
print("y\n", ys)
print("z\n", zs)
print()
#Updated Array
print(numbers)

2 Answers2

0

If you can use pandas, you can do the following:

x = np.array([[12,15,2,0],[65,23,0,1],[24,66,2,0],[65,23,1,0],[24,66,0,0]])
df = pd.DataFrame(x)
new_df = df.iloc[df.loc[:,(0,1)].drop_duplicates().index]
print(new_df)

    0   1  2  3
0  12  15  2  0
1  65  23  0  1
2  24  66  2  0

What it does is the following:

  1. transform the array to pandas data-frame
  2. df.loc[:,(0,1)].drop_duplicates().index will return the indices of the rows you wish to keep (based on the first and second columns)
  3. df.iloc will return the sliced data-frame.

Edit based on OP questions in the comments and @wwii remarks:

  1. you can return to numpy array using .to_numpy(), so just do arr = new_df.to_numpy()

  2. You can try the following:

    xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])
    df = pd.DataFrame(xx)
    df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2].idxmin()])
    df_new.reset_index(drop=True, inplace=True)
    
        0   1  2  3
    0  12  15  2  0
    1  24  66  0  0
    2  65  23  0  0
    

When there is a special heuristic to consider one can do the following:

import pandas as pd
import numpy as np

def f_(x):
    vals = x[2].tolist()
    if len(vals)==2:
        # print(vals)
        if vals[0] == 0 and vals[1] == 1:
            return vals[0]
        elif vals[0] == 1 and vals[1] == 0:
            return vals[1]
        elif vals[0] == 1 and vals[1] == 2:
            return vals[0]
        elif vals[0] == 2 and vals[1] == 0:
            return vals[0]
    elif len(vals) > 2:
        return -1
    else:
        return x[2]

xx = np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])
df = pd.DataFrame(xx)
df_new = df.groupby([0,1], group_keys=False).apply(lambda x: x.loc[x[2] == f_(x)])
df_new.reset_index(drop=True, inplace=True)
print(df_new)

    0   1  2  3
0  12  15  2  0
1  24  66  2  0
2  65  23  0  0
David
  • 8,113
  • 2
  • 17
  • 36
  • @wwii from the example output it doesnt looks like it. look at the second row – David Oct 23 '20 at 12:45
  • My mistake - does your solution depend on row order? Does drop_duplicates always keep the first one? – wwii Oct 23 '20 at 12:47
  • @wwii in the docs: https://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html it says that it keeps the first, but you can change it in case of need – David Oct 23 '20 at 12:49
  • 1
    @DavidS the code is genius!!! It seems to be solving most of the issues, but i have a few questions. 1) Can i transform new_df into a new updated lifeform array? and 2) How did you manage to specify which color codes to remove? – Ibrahim Malik Oct 23 '20 at 12:51
  • Answer depends on order of original array it makes no attempt to select the row based on the question's *color* criteria. It fails with `np.array([[12,15,2,0],[65,23,1,0],[24,66,2,0],[65,23,0,0],[24,66,0,0]])` – wwii Oct 23 '20 at 13:15
  • @wwii please see my edit, where now it depend on the third column – David Oct 23 '20 at 13:24
  • `a "0" c value beats "1", a "1" c value beats "2" and a "2" c value beats "0"` – wwii Oct 23 '20 at 13:36
  • @wwii edited to `idxmin`. But it contradicts the expected output he supplied. – David Oct 23 '20 at 13:37
  • @wwii I would appreciate if you consider undoing the downvote. or alternatively upvote the answer – David Oct 23 '20 at 13:45
  • I still don't think it works. The selection criteria isn't as simple as min or max. It is : min if comparing (0,1) or (1,2) and max if comparing (2,0). – wwii Oct 23 '20 at 13:50
  • @wwii So instead of applying the `idxmin()` the OP can pass the `apply` a function that gets a series and apply the desired behavior. – David Oct 23 '20 at 14:06
  • @DavidS could you try demonstrating the entire code including the color codes. – Ibrahim Malik Oct 23 '20 at 14:10
  • @wwii as I said, when there is a complex conditioning he can simply do as I added in my edit. – David Oct 23 '20 at 14:29
  • @IbrahimMalik just to mention it, the code supplied last will return the same output as what wwii will – David Oct 23 '20 at 16:14
0

Test array

a = lifeforms = np.array([[12,15,2,0],
                          [13,13,0,0],
                          [13,13,1,0],
                          [13,13,2,0],
                          [65,23,1,0],
                          [24,66,2,0],
                          [14,14,1,0],
                          [14,14,1,1],
                          [14,14,1,2],
                          [14,14,2,0],
                          [15,15,3,2],
                          [15,15,2,0],
                          [65,23,0,0],
                          [24,66,0,0]])

Function that implements color selection.

test_one = np.array([[0,1],[1,0],[1,2],[2,1]])
test_two = np.array([[0,2],[2,0]])

def f(g):
    a = g.loc[:,2].unique()
    if np.any(np.all(a == test_one, axis=1)):
        idx = (g[2] == g[2].min()).idxmax()
    elif np.any(np.all(a == test_two, axis=1)):
        idx = (g[2] == g[2].max()).idxmax()
    else:
        raise ValueError('group colors outside bounds')
    return idx

Groupby first two columns; iterate over groups; save indices of desired rows; use those indices to select rows from the DataFrame.

df = pd.DataFrame(a)
gb = df.groupby([0,1])

indices = []
for k,g in gb:
    if g.loc[:,2].unique().shape[0] > 2:
        #print(f'(0,1,2) - dropping indices {g.index}')
        continue
    if g.shape[0] == 1:
        indices.extend(g.index.to_list())
        #print(f'unique - keeping index {g.index.values}')
        continue
    #print(g.loc[:,2])
    try:
        idx = f(g)
    except ValueError as e:
        print(sep)
        print(e)
        print(g)
        print(sep)
        continue 
    #print(f'keeping index {idx}')
    indices.append(idx)
    #print(sep)

print(df.loc[indices,:])
wwii
  • 23,232
  • 7
  • 37
  • 77