For example, I have the following data as a list:
l = [['A', 'aa', '1', '300'],
['A', 'ab', '2', '30'],
['A', 'ac', '3', '60'],
['B', 'ba', '5', '50'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]
Now for 'A'
, 'B'
, and 'C'
, I wanted to get their last occurrences, i.e.:
[['A', 'ab', '3', '30'],
['B', 'bb', '4', '10'],
['C', 'ca', '6', '50']]
or further, the third column in these occurrences, i.e.:
['3', '4', '6']
Currently, the way I deal with this is:
import pandas as pd
df = pd.DataFrame(l, columns=['u', 'w', 'y', 'z'])
df.set_index('u', inplace=True)
ll = []
for letter in df.index.unique():
ll.append((df.ix[letter, 'y'][-1]))
Then I %timeit
, it shows:
>> The slowest run took 27.86 times longer than the fastest.
>> This could mean that an intermediate result is being cached.
>> 1000000 loops, best of 3: 887 ns per loop
Just wondering if there is a way to do this using less time than my code? Thanks!