I have a dataframe with several columns that look like this:
Age G GS
INDEX1 [27, 25, 22, 30, 30] [76, 79, 80, 76, 77] [76, 79, 80, 76, 77]
INDEX2 [24, 23, 21, 32, 34] [77, 76, 81, 75, 77] [77, 76, 81, 75, 77]
How do I go about splitting up all the lists into their own separate columns? Ideally my data frame will look like this once I have finished:
Age Age1 Age2 Age3 Age4 G G1 G2 G3 G4
INDEX1 27 25 22 30 30 76 79 80 76 77 ...
...
If it helps I did convert a dictionary to this data frame. I have tried searching and implementing several different similar solutions on stack but none of them seem to work. This solution converts properly but for some reason creates two NaN columns. If anyone know how to perform this on the entire dataframe, I can drop the extra NaN columns:
df1 = pd.DataFrame(converted['Age'].values.tolist())
df1
0 1 2 3 4 5 6
0 27 25 22 30 30.0 NaN NaN
1 31 29 33 27 33.0 NaN NaN
2 22 21 26 21 33.0 NaN NaN
3 29 24 31 33 27.0 NaN NaN
4 30 21 31 31 32.0 NaN NaN
... ... ... ... ... ... ... ...
1727 28 27 28 20 26.0 NaN NaN
1728 20 29 27 24 20.0 NaN NaN
1729 30 31 34 25 26.0 NaN NaN
1730 31 26 34 21 21.0 NaN NaN
1731 22 24 20 28 25.0 NaN NaN
There are a few other solutions I have tried but errors arose with the Age column, it may have something to do with hidden values but I am not sure.
df2 = pd.DataFrame()
for col in converted.columns:
# names of new columns
feature_columns = [ "{col}_feature1".format(col=col), "{col}_feature2".format(col=col), "{col}_feature3".format(col=col)
, "{col}_feature4".format(col=col)
, "{col}_feature5".format(col=col)]
# split current column
df2[ feature_columns ] = df[ col ].apply(lambda s: pd.Series({ feature_columns[0]: s[0],
feature_columns[1]: s[1],
feature_columns[2]: s[2],
feature_columns[3]: s[3],
feature_columns[4]: s[4]} ) )
print (df2)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Age'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-180-53ed0043f9d8> in <module>
7 , "{col}_feature5".format(col=col)]
8 # split current column
----> 9 df2[ feature_columns ] = df[ col ].apply(lambda s: pd.Series({ feature_columns[0]: s[0],
10 feature_columns[1]: s[1],
11 feature_columns[2]: s[2],
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
377 except ValueError:
378 raise KeyError(key)
--> 379 return super().get_loc(key, method=method, tolerance=tolerance)
380
381 @Appender(_index_shared_docs["get_indexer"])
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Age'
Edit: I tried using solution listed here: Pandas split column of lists into multiple columns
And it didn't work for me. Thank you for the suggestion!