I am reading data from an RSS feed into a dataframe, and am trying to convert words (toddlers,kids,adults) to integers that represent min/max ages. I have gotten as far as replacing the words with numerical strings:
df['audience_max'].head(10)
0 10
1 2,4,3
2 2,4,3
3 10,3
4 2,3
5 4
6 4,3
7 4,3
8 10,4,12
9 2,10,4,3
Name: audience_max, dtype: string
Next, I am running a for loop on this column to find the maximum value in each list and create a new column in the data frame. I used str.split to change the string into a list:
maxlist=[]
for x in df['audience_max'].str.split(','):
maxlist.append(max(x))
df['max_age']=maxlist
With the following result. You can see that line 3,8 and 9 are not showing the maximum value:
0 10
1 4
2 4
3 3
4 3
5 4
6 4
7 4
8 4
9 4
Name: max_age, dtype: object
When I checked the line of code in the for loop that creates the list, it appears to create the list correctly:
(df['audience_max'].str.split(',')).head(10)
0 [10]
1 [2, 4, 3]
2 [2, 4, 3]
3 [10, 3]
4 [2, 3]
5 [4]
6 [4, 3]
7 [4, 3]
8 [10, 4, 12]
9 [2, 10, 4, 3]
Name: audience_max, dtype: object
But the max(x) for some reason does not seem to be returning the max consistently. Is there a different function that I should be using?