How to find a mean between these 2 numbers in 1 column?

Question

How to find a mean between these 2 numbers in 1 column and update column built_up with mean value? And also ignore the number that not in range.

  built_up
0 1498-1602
1 1022-1187
2 1713-1970
3 2305-3396
4 1420
5 -

Here is my data - https://gist.github.com/datomnurdin/21b028b8ed213aacbe4ba4b71ccfe384

I already removed From and sq. ft. using this

df['built_up'] = df['built_up'].map(lambda x: x.lstrip('From ').rstrip(' sq. ft.'))

I recommend you give this question and its answers a read: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Paul H, Jan 20 '20 at 20:12

Mykola Zotko · Answer 1 · 2020-01-20T20:56:02.213

1

If you have only two values you can use mean :

df['built_up'].str.split('-', expand=True).apply(pd.to_numeric, errors='coerce').mean(axis=1)

Output:

0    1550.0
1    1104.5
2    1841.5
3    2850.5
4    1420.0
5       NaN
dtype: float64

edited Jan 20 '20 at 20:56

answered Jan 20 '20 at 20:32

Mykola Zotko

15,583
3
71
73

I got NaN instead values. – Nurdin Jan 20 '20 at 20:35
@MohammadNurdin With `median`? – Mykola Zotko Jan 20 '20 at 20:38
ValueError: could not convert string to float: – Nurdin Jan 20 '20 at 20:41
```.fillna(0).astype(float)``` – Grzegorz Skibinski Jan 20 '20 at 20:44
@MohammadNurdin You have some problems in you data that we cannot reproduce. – Mykola Zotko Jan 20 '20 at 20:46
you can refer my data here, https://gist.github.com/datomnurdin/21b028b8ed213aacbe4ba4b71ccfe384 – Nurdin Jan 20 '20 at 20:56
@MohammadNurdin I edited my answer. We cannot clean your data. – Mykola Zotko Jan 20 '20 at 20:57

Andy L. · Accepted Answer · 2020-01-20T21:28:52.067

Edit: For you real data, you should use str.findall as follows

df['b_median'] = [np.median(pd.to_numeric(x if bool(x) else np.nan, errors='coerce')) 
                         for x in df['built_up'].str.findall('\d+')]

Original:

Your real data have some unbalanced strings, try strip before call map with np.median and pd.to_numeric

s = (df['built_up'].map(lambda x: 
                        np.median(pd.to_numeric(x.strip('- ').split('-'), errors='coerce'))))

Out[356]:
0    1550.0
1    1104.5
2    1841.5
3    2850.5
4    1420.0
5       NaN
Name: built_up, dtype: float64

Method 2: On processing strings in pandas cell, list comprehension is faster

df['b_median'] = [np.mean(pd.to_numeric(x.strip('- ').split('-'), errors='coerce')) 
                       for x in df.built_up]

Out[354]:
    built_up  b_median
0  1498-1602    1550.0
1  1022-1187    1104.5
2  1713-1970    1841.5
3  2305-3396    2850.5
4       1420    1420.0
5          -       NaN

you can refer my data here, https://gist.github.com/datomnurdin/21b028b8ed213aacbe4ba4b71ccfe384 — Nurdin, Jan 20 '20 at 20:56

How to find a mean between these 2 numbers in 1 column?

2 Answers2