1

I have the following data frame:

   a    b             x  
0  id1  abc 123 tr    2  
1  id2  abd1 124 tr   6 
2  id3  abce 126 af   9 
3  id4  abe 128 nm    12 

From column b, for each item, I need to extract the substrings before the first space. Hence, I need the following result:

list_of_strings = [abc, abd1, abce, abe]

Please advise

Tipo33
  • 181
  • 13

1 Answers1

2

Use a regex with ^\S+ (non-space characters anchored to the start of string) and str.extract:

df['b'].str.extract(r'^(\S+)', expand=False)

Output:

0     abc
1    abd1
2    abce
3     abe
Name: b, dtype: object

For a list:

list_of_strings = df['b'].str.extract(r'^(\S+)', expand=False).tolist()
# ['abc', 'abd1', 'abce', 'abe']

regex demo

mozway
  • 194,879
  • 13
  • 39
  • 75