-7

I have a dataset black friday. Here is how it looks.
The Age is given in range like 1-17,18-25 etc. I want to replace all such ranges by their mean. I can either traverse each element of the Age column and parse them and replace the string value by mean. That probably would be inefficient.

So I want to know is there any shorter way to do that ? or Is there any alternative way to process the range of data? (in python ofcourse)

DSM
  • 342,061
  • 65
  • 592
  • 494
tsukyonomi06
  • 514
  • 1
  • 6
  • 19
  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Aug 01 '17 at 13:29
  • 1
    @jezrael ,This kind of question is supposed to be migrated or transfer it here [Cross Validated](https://stats.stackexchange.com/) for effective feedback.Or else it's better to close/delete instead of down votes accumulating as if we don't have professionals here – quintumnia Aug 01 '17 at 14:32

1 Answers1

0

There are several ways to transform this variable. In the picture I see, that there are not only bins, but also value '55+', it needs to be considered.

1) One liner:

df['age'].apply(lambda x: np.mean([int(x.split('-')[0]), int(x.split('-')[1])]) if '+' not in x else x[:-1])

It checks whether the value contains '+' (like 55+), if yes than the value without '+' is returned. Otherwise the bin is splitted into two values, they are converted to ints and their mean is calculated.

2) Using dictionary for transformation:

mapping = {'1-17': 9, '18-25': 21.5, '55+': 55}
df['age'].apply(lambda x: mapping[x])

You need to add all values to mapping dictionary (calculate them manually or automatically). Then you apply this transformation to the series.

Andrey Lukyanenko
  • 3,679
  • 2
  • 18
  • 21