I have a dataframe of shape (2061, 5)
and the following line:
df[6] = df.groupby(df.index)[6].transform(lambda x: ' '.join(x))
..causes the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-27721ddd8064> in <module>
----> 1 df.groupby(df.index)[6].transform(lambda x: ' '.join(x))
~/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/pandas/core/groupby/generic.py in transform(self, func, *args, **kwargs)
463
464 if not isinstance(func, str):
--> 465 return self._transform_general(func, *args, **kwargs)
466
467 elif func not in base.transform_kernel_whitelist:
~/.pyenv/versions/miniconda3-latest/lib/python3.7/site-packages/pandas/core/groupby/generic.py in _transform_general(self, func, *args, **kwargs)
487 for name, group in self:
488 object.__setattr__(group, "name", name)
--> 489 res = func(group, *args, **kwargs)
490
491 if isinstance(res, (ABCDataFrame, ABCSeries)):
<ipython-input-19-27721ddd8064> in <lambda>(x)
----> 1 df.groupby(df.index)[6].transform(lambda x: ' '.join(x))
TypeError: sequence item 0: expected str instance, float found
I developed that code on a subset of the dataframe and it seemed to be doing exactly what I wanted to the data. So now if I for example do this:
df = df.head(50)
..and run the code, the error message goes away again.
I think somewhere, a type cast is happening except at one of the lines it decides to do something else. How can I efficiently find which row in the df is causing this without manually reading through the whole two thousand long column or a trial an error thing with .head()
of different sizes?