Given a matrix from an SFrame
:
>>> from sframe import SFrame
>>> sf =SFrame({'x':[1,1,2,5,7], 'y':[2,4,6,8,2], 'z':[2,5,8,6,2]})
>>> sf
Columns:
x int
y int
z int
Rows: 5
Data:
+---+---+---+
| x | y | z |
+---+---+---+
| 1 | 2 | 2 |
| 1 | 4 | 5 |
| 2 | 6 | 8 |
| 5 | 8 | 6 |
| 7 | 2 | 2 |
+---+---+---+
[5 rows x 3 columns]
I want to get the unique values for the x
and y
columns and I can do it as such:
>>> sf['x'].unique().append(sf['y'].unique()).unique()
dtype: int
Rows: 7
[2, 8, 5, 4, 1, 7, 6]
This way I get the unique values of x and unique values of y then append them and get the unique values of the appended list.
I could also do it as such:
>>> sf['x'].append(sf['y']).unique()
dtype: int
Rows: 7
[2, 8, 5, 4, 1, 7, 6]
But that way, if my x and y columns are huge with lots of duplicates, I would be appending it into a very huge container before getting the unique.
Is there a more efficient way to get the unique values of a combined columns created from 2 or more columns in an SFrame?
What is the equivalence in pandas of the efficent way to get unique values from 2 or more columns in pandas
?