I have a dataset where each record contains the date the user tweeted, their screenname, their follower count, and their friend count. Users can be listed multiple times throughout the entire dataset, and at different times as well as with different follower/friend counts at these various times. What I would like to do is to get a unique list of users in the list and their most recent follower/friend count. I do not want to just de-duplicate on their screenname, but instead I want their most recent values.
This is what my data currently looks like with duplicate values
In [14]: data
Out[14]:
[(datetime.datetime(2014, 11, 21, 1, 16, 2), u'AlexMatosE', 773, 560),
(datetime.datetime(2014, 11, 21, 1, 17, 6), u'hedofthebloom', 670, 618),
(datetime.datetime(2014, 11, 21, 1, 18, 8), u'hedofthebloom', 681, 615),
(datetime.datetime(2014, 11, 21, 1, 19, 1), u'jape2116', 263, 540),
(datetime.datetime(2014, 11, 21, 1, 19, 3), u'_AlexMatosE', 790, 561),
(datetime.datetime(2014, 11, 21, 1, 19, 5), u'Buffmuff69', 292, 270),
(datetime.datetime(2014, 11, 21, 1, 20, 1), u'steveamodu', 140, 369),
(datetime.datetime(2014, 11, 21, 1, 20, 9), u'jape2116', 263, 540),
(datetime.datetime(2014, 11, 21, 1, 21, 3), u'chighway', 363, 767),
(datetime.datetime(2014, 11, 21, 1, 22, 9), u'jape2116', 299, 2000)]
This is how I can get the unique users in the data
In [15]: users = set(sorted([line[1] for line in data]))
Now I need to figure out how to get the MOST RECENT set of values for each unique users in the dataset. I'm not sure if a for-loop is the best way to go here or if something else would be better.
In [18]: most_recent_user_data = []
....: for line in data:
....: if line[1] in users:
....: ...
....: ...
....: ...
....: most_recent_user_data.append((line[1], line[2], line[3]))
Ultimate, I want to end up with each unique user once, and their MOST RECENT followers/friends value
In [19]: most_recent_user_data
Out[19]:
(u'hedofthebloom', 681, 615),
(u'_AlexMatosE', 790, 561),
(u'Buffmuff69', 292, 270),
(u'steveamodu', 140, 369),
(u'chighway', 363, 767),
(u'jape2116', 299, 2000)]