I've a dataframe
as below.
df = pd.DataFrame({
'code' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Tag' : ['A','B','C','D','B','C','D','A','D','C']
})
+------+-----+
| code | Tag |
+------+-----+
| 1 | A |
+------+-----+
| 2 | B |
+------+-----+
| 3 | C |
+------+-----+
| 4 | D |
+------+-----+
| 5 | B |
+------+-----+
| 6 | C |
+------+-----+
| 7 | D |
+------+-----+
| 8 | A |
+------+-----+
| 9 | D |
+------+-----+
| 10 | C |
+------+-----+
My objective is to create code
lists based on the common items in the Tag
column as below.
codes_A = [1,8]
codes_B = [2,5]
codes_C = [3,6,10]
codes_D = [4,7,9]
How I'm doing it right now is
codes_A = df[df['Tag'] == 'A']['code'].to_list()
codes_B = df[df['Tag'] == 'B']['code'].to_list()
codes_C = df[df['Tag'] == 'C']['code'].to_list()
codes_D = df[df['Tag'] == 'D']['code'].to_list()
This code does the job. But, as you can see this is very cumbersome and inefficient. I'm repeating the same code multiple times and also repeating when I want to create new lists.
is there a more efficient and pythonic
way to do this in pandas
or numpy
?