A common operation in pandas is something such as
In [14]: import io
In [15]: csv='''\
...: a,b
...: 1,2
...: 1,3
...: 2,3
...: 3,1
...: 3,3'''
In [16]: dt = pd.read_csv(io.StringIO(csv))
In [17]: dt
Out[17]:
a b
0 1 2
1 1 3
2 2 3
3 3 1
4 3 3
In [18]: dt.drop_duplicates(subset = ['a'])
Out[18]:
a b
0 1 2
2 2 3
3 3 1
How can this be performed in SQL though? Is there either a standard function or approach to doing what drop_duplicates(subset = <list>)
does?
Edit
How pandas duplicate function works:
In [20]: dt['a'].duplicated()
Out[20]:
0 False
1 True
2 False
3 False
4 True
Name: a, dtype: bool
In [21]: dt.drop_duplicates(subset=['a'])
Out[21]:
a b
0 1 2
2 2 3
3 3 1