22

I have a DataFrame that contains a list on each column as shown in the example below with only two columns.

    Gamma   Beta
0   [1.4652917656926299, 0.9326935235505321, float] [91, 48.611034768515864, int]
1   [2.6008354611105995, 0.7608529935313189, float] [59, 42.38646954167245, int]
2   [2.6386970166722348, 0.9785848171888037, float] [89, 37.9011122659478, int]
3   [3.49336632573625, 1.0411524946972244, float]   [115, 36.211134224288344, int]
4   [2.193991200007534, 0.7955134305428825, float]  [128, 50.03563864975485, int]
5   [3.4574527664490997, 0.9399880977511021, float] [120, 41.841146628802875, int]
6   [3.1190582380554863, 1.0839109431114795, float] [148, 55.990072419824514, int]
7   [2.7757359940789916, 0.8889801332053203, float] [142, 51.08885697101243, int]
8   [3.23820908493237, 1.0587479742892683, float]   [183, 43.831293356668425, int]
9   [2.2509032790941985, 0.8896196407231622, float] [66, 35.9377662201882, int]

I'd like to extract for every column the first position of the list on each row to get a DataFrame looking as follows.

    Gamma   Beta
0   1.4652917656926299  91
1   2.6008354611105995  59
2   2.6386970166722348  89
...

Up to now, my solution would be like [row[1][0] for row in df_params.itertuples()], which I could iterate for every column index of the row and then compose my new DataFrame.

An alternative is new_df = df_params['Gamma'].apply(lambda x: x[0]) and then to iterate to go through all the columns.

My question is, is there a less cumbersome way to perform this operation?

IanS
  • 15,771
  • 9
  • 60
  • 84
Ignacio Vergara Kausel
  • 5,521
  • 4
  • 31
  • 41
  • 1
    Similar question (with one additional way of solving it): https://stackoverflow.com/questions/26069235/python-pandas-selecting-element-in-array-column – Eulenfuchswiesel Mar 06 '19 at 12:17

4 Answers4

63

You can use the str accessor for lists, e.g.:

df_params['Gamma'].str[0]

This should work for all columns:

df_params.apply(lambda col: col.str[0])
IanS
  • 15,771
  • 9
  • 60
  • 84
  • 4
    str[0] is nice shot ;) – jezrael Aug 31 '17 at 13:56
  • 1
    Oh wow that's pretty nice. Goodbye list comprehension. Definitely getting my vote. – A.Kot Aug 31 '17 at 13:58
  • :D, :D, :D it is nice – jezrael Aug 31 '17 at 13:58
  • I wasn't that far off :). Thanks, worked like a charm! – Ignacio Vergara Kausel Aug 31 '17 at 13:59
  • 5
    `str` is an accessor to treat the value as a string, right? What makes this work? – Scro Dec 11 '19 at 21:31
  • 2
    @Scro, `Series.str[0]` calls [`StringMethods.__getitem__`](https://github.com/pandas-dev/pandas/blob/4e8306690e8c29080c1886442ffd209c259bcd3d/pandas/core/strings/accessor.py#L240), which ultimately calls [`_str_get`](https://github.com/pandas-dev/pandas/blob/4e8306690e8c29080c1886442ffd209c259bcd3d/pandas/core/strings/object_array.py#L267) when the key is not a slice. `_str_get` is also how `Series.str.get()` works, and the docs for that function say it can "extract element from lists, tuples, dict, or strings". Thus, some (but not all) methods of the `.str` accessor can work on non-strings. – ogdenkev Jun 08 '23 at 14:45
3

Itertuples would be pretty slow. You could speed this up with the following:

for column_name in df_params.columns:
    df_params[column_name] = [i[0] for i in df_params[column_name]]
A.Kot
  • 7,615
  • 2
  • 22
  • 24
1

It's also possible for Series.str.get

df_params['Gamma'].str.get(0)
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
0

The given answer fails if your inputs length vary.

Then, you can unnest column-wise and drop the unnecessary ones.

Here is how to unnest. Taken from this answer.

def unnesting(df, explode, axis):
    if axis==1:
        df1 = pd.concat([df[x].explode() for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how='left')
    else :
        df1 = pd.concat([
                         pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
        return df1.join(df.drop(explode, 1), how='left')
Mr. Panda
  • 485
  • 3
  • 14