1

I am trying to calculate the mean and the standard deviation for pandas dataframe columns that contain lists of floats. I do not think that I need to extract each list in order to calculate it so I try to operate within the dataframe. Surprisingly, I could not find anything on that particular topic.

Here is an toy-example to illustrate my issue:

l = pd.DataFrame({'D' : [[4,5,6,6,6],[6,8,8,3]], 'R' : [[3,5,6,4,6],[6,9,9,3]]})

l1 = l.apply(pd.to_numeric).mean()
l2 = l.apply(pd.to_numeric).std()

I am getting the following error:

Traceback (most recent call last):
  File "pandas/_libs/lib.pyx", line 1892, in pandas._libs.lib.maybe_convert_numeric
TypeError: Invalid object type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pierre/Desktop/Project_inv/pr.py", line 8, in <module>
    l1 = l.apply(pd.to_numeric).mean()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 6487, in apply
    return op.get_result()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 151, in get_result
    return self.apply_standard()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 257, in apply_standard
    self.apply_series_generator()
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/apply.py", line 286, in apply_series_generator
    results[i] = self.f(v)
  File "/Users/pierre/PycharmProjects/untitled22/venv/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 135, in to_numeric
    coerce_numeric=coerce_numeric)
  File "pandas/_libs/lib.pyx", line 1925, in pandas._libs.lib.maybe_convert_numeric
TypeError: ('Invalid object type at position 0', 'occurred at index D')

I am not sure what is wrong, would someone have a hint on how to proceed to solve this issue?

Murcielago
  • 905
  • 1
  • 8
  • 30

1 Answers1

1

First I think working with lists in pandas is not good idea.

But is really need it, is it possible by processing elementwise by DataFrame.applymap:

l1 = l.applymap(lambda x: np.mean(x))
print (l1)
      D     R
0  5.40  4.80
1  6.25  6.75

l2 = l.applymap(lambda x: np.std(x))
print (l2)
          D         R
0  0.800000  1.166190
1  2.046338  2.487469

So I recommended first flatten lists, e.g. by DataFrame.explode for pandas 0.25+ and then processing:

df = pd.concat([l['D'].explode(), l['R'].explode()], axis=1).astype(int)
print (df)
   D  R
0  4  3
0  5  5
0  6  6
0  6  4
0  6  6
1  6  6
1  8  9
1  8  9

l1 = df.mean(level=0)
print (l1)
      D     R
0  5.40  4.80
1  6.25  6.75

l2 = df.std(level=0)
print (l2)
          D         R
0  0.894427  1.303840
1  2.362908  2.872281

l21 = df.std(level=0, ddof=0)
print (l21)
          D         R
0  0.800000  1.166190
1  2.046338  2.487469
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252