1

I want to replace the values of a Pandas DataFrame column with its value in range form in each cell, by calculating its absolute/mean value.

Column values: single cell is “46-55” next cell value is “26-35” and next cell value is “100+” (without quotes)

Example input: pandas.core.series.Series('46-55', '26-35', '60+')

example input image

Expected output: pandas.core.series.Series('50.5','30.5','60')

example output image

where, 50.5 is the mean of 46 and 55

batman
  • 100
  • 1
  • 11
  • Possible duplicate of [pandas: split string, and count values?](https://stackoverflow.com/questions/48508573/pandas-split-string-and-count-values) – oreopot Oct 09 '19 at 03:23

2 Answers2

0
a = pd.Series(['46-55', '26-35', '60+'])
>>>a
0    46-55
1    26-35
2      60+
dtype: object

We can directly find all the numbers in each cell's string and output the whole thing as a separate series

b = a.str.findall('(\d+)')
>>>b
0    [46, 55]
1    [26, 35]
2        [60]
dtype: object

Now, we can find the mean of each element of the new series that we have just created, and output it as the required series

c = pd.Series([])
for i in range(0,len(b)):
    b[i] = np.array(b[i]).astype(np.float)
    c[i] = sum(b[i]) / len(b[i])

>>>c
0    50.5
1    30.5
2    60.0
dtype: float64
Nitin Singh
  • 92
  • 11
  • You are taking whole range as all the values in column, this is not what was asked, scenario is value of single cell is “46-55” next cell value is “26-35” and next cell value is “100+” (without quotes) – batman Oct 09 '19 at 04:29
  • @John, I have edited my response. I had misinterpreted the question earlier. I hope this will help in what you were looking for. – Nitin Singh Oct 09 '19 at 07:55
0
>>> import pandas as pd

Input

# assuming there is no noise data in age_range (ex. no special char other than -,+)
>>> age_range = pd.Series(('46-55', '26-35', '60+'))
>>> age_range
0    46-55
1    26-35
2      60+
dtype: object

Function to convert str range into int range list

>>> split_range = lambda age_range : [[int(y) for y in x.split('-')] if len(x.split('-')) == 2 else [int(x.split('+')[0])] for x in age_range]

# test func
>>> alter_age_range = split_range(age_range)
>>> alter_age_range
[[46, 55], [26, 35], [60]]

Output

>>> ages_mean = pd.Series([sum(ages)/len(ages) for ages in split_range(age_range)])
>>> ages_mean
0    50.5
1    30.5
2    60.0
dtype: float64
Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37