1

I want to transform all rows of a data frame to arrays and use the arrays in a function. The function should create a new column with the results of the function for every row.

def harmonicMean(arr):
    sum = 0;
    for item in arr:
        sum = sum + float(1.0/item);
        print "inside" + str(float(1.0/item));
    print sum;
    return float(len(arr) / sum);

The function actually generates harmonic mean for every row in the data frame. These values should be populated in a new column in the data frame. (the data frame also contains Nan values)

Jack Moody
  • 1,590
  • 3
  • 21
  • 38
Jagruthi C
  • 71
  • 7
  • Can you provide more information? as data sample (can be `df.head()`), what did you try and what is your desire output – Terry Apr 08 '19 at 16:18

2 Answers2

3

You can calculate without iterating over the rows:

df['hmean'] = df.notnull().sum(axis=1)/(1/df).sum(axis=1)

   a    b    c     d   e     hmean
0  4  5.0  2.0   5.0  10  4.000000
1  2  8.0  1.0   8.0   6  2.608696
2  7  NaN  1.0   1.0   8  1.763780
3  7  1.0  9.0   4.0   9  3.095823
4  8  5.0  8.0   NaN   3  5.106383
5  3  8.0  6.0  10.0   6  5.607477
6  3  7.0  3.0   9.0   9  4.846154
7  8  NaN  NaN   NaN   6  6.857143
8  2  4.0  1.0   5.0   2  2.040816
9  5  7.0  5.0   3.0   1  2.664975
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Hi! thank you for the answer, I get 1 error which I do not understand. It says: Could not operate 1 with block values float division by zero. Do you know what it means? – Jagruthi C Apr 08 '19 at 16:46
  • @JagruthiC I am not quite sure. It may be a divide by 0 issue, though I can't replicate that issue on my end, as this seems to handle all NaN rows and 0/NaN or #/0 on my end. – ALollz Apr 08 '19 at 16:57
0

you can use in built .iloc and .to_list() methods to get the rows as an array and pass them to your method.

rows = df.shape[0]
for i in range(rows):
    row_lst = df.iloc[i].to_list()
    print(harmonicMean(row_lst))
nickyfot
  • 1,932
  • 17
  • 25
  • df.values will give numpy ndarray.. this can be iterable along a row.. faster in this way.. – Bhanu Tez Apr 08 '19 at 16:22
  • @nickthefreak thank you! it worked. However i get this error:ZeroDivisionError: ('float division by zero', 'occurred at index 0') as the rows contains zeroes as well. any idea how do i ignore zero and Nan values while computing the harmonic mean? – Jagruthi C Apr 08 '19 at 16:28
  • I cannot tell if the zero division error is when you divide by item or when you divide by the sum; probably can happen for either one division. You probably need to add an if statement checking that item and sum are greater than 0 before dividing – nickyfot Apr 08 '19 at 16:34
  • i did try that, i get this error stating: row_lst = data1.iloc[i].to_list() File "C:\Users\Pinky\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 4376, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'to_list' – Jagruthi C Apr 08 '19 at 16:43