average of two values in a dataframe column

Question

I am trying to get the average value of Price

import pandas as pd
df = pd.read_csv('sample_data1.csv')

#file sample

Name,Price
Eedor,"￥1,680"
Avidlove,"￥11,761"
Fitment,
Vintage,$8.95 - $16.95
silhouette,$27.80 - $69.50
Silk,$50.02

I am trying to get the average value in the "Price" column, then if it is in Yen convert to USD I have written this small function which should do the job, I am not sure how can I apply it on the column

import re
#1￥ =0.0090$
def my_func(value):
    if not value:
        return None #remove row
    elif "￥" in value:
        try:
            temp = re.search(r'(\d+\,*\.*\d*) - .(\d+\,*\.*\d*)',value).groups()
            return (float(temp[0].replace(',',''))+float(temp[1].replace(',','')))*0.09/2
        except:
            return float(re.search(r'(\d+\,*\.*\d*)',value).groups()[0].replace(',',''))*0.009
    else:
        try:
            temp = re.search(r'(\d+\,*\.*\d*) - .(\d+\,*\.*\d*)',value).groups()
            return (temp[0]+temp[1])/2
        except:
            return float(re.search(r'(\d+\,*\.*\d*)',value).groups()[0].replace(',',''))

What I want is to replace the price column with the average value in $

_I am not sure how can I apply it on the column_ What do you mean? Is this a duplicate of https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe? This seems like the kind of problem which can be resolved by reading the Pandas docs. What's with the format of that data, it looks inconsistent. By the way, using a bare `except` statement like this is bad practice, see https://stackoverflow.com/questions/54948548/what-is-wrong-with-using-a-bare-except, for example. — AMC, Feb 26 '20 at 18:35
@AMC I know that it can only result in `AttributeError` exception, and on the question you mention is unrelated — , Feb 26 '20 at 18:39
How is it unrelated? If the error you're trying to catch is an AttributeError, why not make it safer and more explicit by using `except AttributeError` ? — AMC, Feb 26 '20 at 18:42

score 0 · Accepted Answer · answered Feb 26 '20 at 16:38

This does what you want, without the currency symbol:

df['average'] = df.Price.str.replace(',','').str.extractall('([\d\.]+)').astype(float)[0].mean(level=0)

Output:

         Name            Price   average
0       Eedor           ￥1,680   1680.00
1    Avidlove          ￥11,761  11761.00
2     Fitment              NaN       NaN
3     Vintage   $8.95 - $16.95     12.95
4  silhouette  $27.80 - $69.50     48.65
5        Silk           $50.02     50.02

To correct the yen:

df['average'] = np.where(df.Price.str[:1].eq('￥'), 
                         df['average']*Yen_to_USD_rate, 
                         df['average'])

average of two values in a dataframe column

1 Answers1