6

I have a dataframe with columns consisting of lists of numbers:

idx codes       new_column
0   [12,18,5]
1   [22,15]
2   [4]
3   [15,1]

How can I add a new column to the dataframe consisting of the first list entry of the codes column:

idx codes     new_column
0   [12,18,5]  12
1   [22,15]    22
2   [4]         4
3   [15,1]     15

I tried:
    df['new_column']=df['codes'][0]

However, that didn't work.
cs95
  • 379,657
  • 97
  • 704
  • 746
afshin
  • 1,783
  • 7
  • 22
  • 39

1 Answers1

10

The easiest way is using str.get

# df['new_column'] = df['codes'].str.get(0)
df['new_column'] = df['codes'].str[0]

However, I would suggest a list comprehension for speed, if there are no NaNs:

df['new_column'] = [l[0] for l in df['codes']]

If lists can be empty, you can do something like:

df['new_column'] = [l[0] if len(l) > 0 else np.nan for l in df['codes']]

To handle NaNs with the list comprehension, you can use loc to subset and assign back.

m = df['codes'].notna()
df.loc[m, 'new_column'] = [
    l[0] if len(l) > 0 else np.nan for l in df.loc[m, 'codes']]

Obligatory why-is-list-comp-worth-it link: For loops with pandas - When should I care?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • I tried df['codes'].str[0] and on 285,000 records it was very fast, about 2 sec. would list comprehension be even faster? – afshin Jan 15 '19 at 18:13
  • @afshin Yes! Try it. – cs95 Jan 15 '19 at 19:02
  • It is interesting that `pd.Series.str` is really useful regardless of what the value of the content is... even a list of Dict objects works well with this method, if you want to access a key like the first author in a list of dictionaries in a complex column and count the unique authors: `node_df["authors"].str[0].str["#text"].unique().count()` – rjurney Feb 06 '23 at 23:09