0

How to find a mean between these 2 numbers in 1 column and update column built_up with mean value? And also ignore the number that not in range.

  built_up
0 1498-1602
1 1022-1187
2 1713-1970
3 2305-3396
4 1420
5 -

Here is my data - https://gist.github.com/datomnurdin/21b028b8ed213aacbe4ba4b71ccfe384

I already removed From and sq. ft. using this

df['built_up'] = df['built_up'].map(lambda x: x.lstrip('From ').rstrip(' sq. ft.'))
halfer
  • 19,824
  • 17
  • 99
  • 186
Nurdin
  • 23,382
  • 43
  • 130
  • 308

2 Answers2

1

If you have only two values you can use mean :

df['built_up'].str.split('-', expand=True).apply(pd.to_numeric, errors='coerce').mean(axis=1)

Output:

0    1550.0
1    1104.5
2    1841.5
3    2850.5
4    1420.0
5       NaN
dtype: float64
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
1

Edit: For you real data, you should use str.findall as follows

df['b_median'] = [np.median(pd.to_numeric(x if bool(x) else np.nan, errors='coerce')) 
                         for x in df['built_up'].str.findall('\d+')]

Original:

Your real data have some unbalanced strings, try strip before call map with np.median and pd.to_numeric

s = (df['built_up'].map(lambda x: 
                        np.median(pd.to_numeric(x.strip('- ').split('-'), errors='coerce'))))

Out[356]:
0    1550.0
1    1104.5
2    1841.5
3    2850.5
4    1420.0
5       NaN
Name: built_up, dtype: float64

Method 2: On processing strings in pandas cell, list comprehension is faster

df['b_median'] = [np.mean(pd.to_numeric(x.strip('- ').split('-'), errors='coerce')) 
                       for x in df.built_up]

Out[354]:
    built_up  b_median
0  1498-1602    1550.0
1  1022-1187    1104.5
2  1713-1970    1841.5
3  2305-3396    2850.5
4       1420    1420.0
5          -       NaN
Andy L.
  • 24,909
  • 4
  • 17
  • 29