0

I'm the process of cleaning a data frame, and one particular column contains values that are comprised of lists. I'm trying to find the average of those lists and update the existing column with an int while preserving the indices. I can successfully and efficiently convert those values to a list, but I lose the index values in the process. The code I've written below is too memory-tasking to execute. Is there a simpler code that would work?

data: https://docs.google.com/spreadsheets/d/1Od7AhXn9OwLO-SryT--erqOQl_NNAGNuY4QPSJBbI18/edit?usp=sharing

def Average(lst):
    sum1 = 0
    average = 0
    if len(x) == 1:
        for obj in x:
            sum1 = int(obj)

    if len(x)>1:
        for year in x:
            sum1 += int(year)
        average = sum1/len(x)

    return mean(average) 

hello = hello[hello.apply([lambda x: mean(x) for x in hello])]

Here's the loop I used to convert the values into a list:

df_list1 = []

for x in hello:
        sum1 = 0
        average = 0
        if len(x) == 1:
            for obj in x:
                df_list1.append(int(obj))

        if len(x)>1:
            for year in x:
                sum1 += int(year)
                average = sum1/len(x)
            df_list1.append(int(average))
Cole
  • 1
  • 1

1 Answers1

0

Use apply and np.mean.

import numpy as np

df = pd.DataFrame(data={'listcol': [np.random.randint(1, 10, 5) for _ in range(3)]}, index=['a', 'b', 'c'])

# np.mean will return NaN on empty list
df['listcol'] = df['listcol'].fillna([])

# can use this if all elements in lists are numeric
df['listcol'] = df['listcol'].apply(lambda x: np.mean(x))

# use this instead if list has numbers stored as strings
df['listcol'] = df['listcol'].apply(lambda x: np.mean([int(i) for i in x])) 

Output

>>>df
   listcol
a      5.0
b      5.2
c      4.4
Eric Truett
  • 2,970
  • 1
  • 16
  • 21
  • Thanks for the help, however I tried using that code and received the error , "cannot perform reduce with flexible type". – Cole Apr 22 '20 at 23:50
  • If the column is not numeric, then you need to convert to int or float to calculate an average. – Eric Truett Apr 22 '20 at 23:54
  • The column is of type object that contains a mixture of lists and NaN values. How would I convert the lists to ints? – Cole Apr 23 '20 at 00:35
  • I'm now getting an error, "'float' object is not iterable". Is this because of the data type of the elements contained within the list? – Cole Apr 23 '20 at 01:12