I have a dataset black friday.
Here is how it looks.
The Age is given in range like 1-17,18-25 etc. I want to replace all such ranges by their mean. I can either traverse each element of the Age column and parse them and replace the string value by mean. That probably would be inefficient.
So I want to know is there any shorter way to do that ? or Is there any alternative way to process the range of data? (in python ofcourse)
Asked
Active
Viewed 95 times
-7

DSM
- 342,061
- 65
- 592
- 494

tsukyonomi06
- 514
- 1
- 6
- 19
-
1Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Aug 01 '17 at 13:29
-
1@jezrael ,This kind of question is supposed to be migrated or transfer it here [Cross Validated](https://stats.stackexchange.com/) for effective feedback.Or else it's better to close/delete instead of down votes accumulating as if we don't have professionals here – quintumnia Aug 01 '17 at 14:32
1 Answers
0
There are several ways to transform this variable. In the picture I see, that there are not only bins, but also value '55+', it needs to be considered.
1) One liner:
df['age'].apply(lambda x: np.mean([int(x.split('-')[0]), int(x.split('-')[1])]) if '+' not in x else x[:-1])
It checks whether the value contains '+' (like 55+), if yes than the value without '+' is returned. Otherwise the bin is splitted into two values, they are converted to ints and their mean is calculated.
2) Using dictionary for transformation:
mapping = {'1-17': 9, '18-25': 21.5, '55+': 55}
df['age'].apply(lambda x: mapping[x])
You need to add all values to mapping dictionary (calculate them manually or automatically). Then you apply this transformation to the series.

Andrey Lukyanenko
- 3,679
- 2
- 18
- 21