Extract substrings from a column of strings and place them in a list

Question

I have the following data frame:

   a    b             x  
0  id1  abc 123 tr    2  
1  id2  abd1 124 tr   6 
2  id3  abce 126 af   9 
3  id4  abe 128 nm    12

From column b, for each item, I need to extract the substrings before the first space. Hence, I need the following result:

list_of_strings = [abc, abd1, abce, abe]

Please advise

score 2 · Accepted Answer · answered May 24 '23 at 14:52

Use a regex with ^\S+ (non-space characters anchored to the start of string) and str.extract:

df['b'].str.extract(r'^(\S+)', expand=False)

Output:

0     abc
1    abd1
2    abce
3     abe
Name: b, dtype: object

For a list:

list_of_strings = df['b'].str.extract(r'^(\S+)', expand=False).tolist()
# ['abc', 'abd1', 'abce', 'abe']

1 Answers1