0

Here is the standard example that I would like to apply to a dataframe.

Standard Example applied to an array with Desired output

import numpy as np

A = np.array([9,2,9,5])

C, ia, ic = np.unique(A, return_index=True, return_inverse=True)  

print(C)
print(ia)
print(ic)

output

[2 5 9]
[1 3 0]
[2, 0, 2, 1]

How can I expand that example to a dataFrame please?. Conceptually, I would like to achieve the same results but with a dataframe with multiple columns instead of A. The code below did not work for me.

C, ia, ic = np.unique(DF[['column1', 'column2', 'column3']], return_index=True, return_inverse=True)

I also tried the following but i am not sure it is giving me the right answer.

C, ia, ic = np.unique(DF[['column1', 'column2', 'column3']].values, return_index=True, return_inverse=True)

Any help is more than welcome

SBad
  • 1,245
  • 5
  • 23
  • 36
  • 1
    You are working with 2d array, so need [this](https://stackoverflow.com/q/16970982) – jezrael May 20 '19 at 12:52
  • 1
    You can **df.apply(lambda col: np.unique(col, return_index=True, return_inverse=True), axis=0)** – Benjamin Breton May 20 '19 at 12:53
  • when I apply your code to a sample a df. `df = pd.DataFrame({'b':[8,4,5,2,8,2], 'c': [3,0,9, 3, 3,3], 'd': [5,8,9, 3, 5,3]}, index=[1,2,3,4, 5, 6])` and `C,ia,ic=df.apply(lambda col: np.unique(col, return_index=True, return_inverse=True), axis=0) i get C=(array([2, 4, 5, 8]), array([3, 1, 2, 0]), array([3, 1, 2, 0, 3, 0]))` and `ia =(array([0, 3, 9]), array([1, 0, 2]), array([1, 0, 2, 1, 1, 1]))` and `ic=(array([3, 5, 8, 9]), array([3, 0, 1, 2]), array([1, 2, 3, 0, 1, 0]))` I was hoping to see C= matrix of unique rows and ia and ic vectors of indices – SBad May 20 '19 at 13:13
  • Thanks jezrael the link helpe me find my answer `C, ia,ic=np.unique(df, return_index=True, return_inverse=True, axis=0)` – SBad May 20 '19 at 13:22

0 Answers0