Get first half of string from pandas dataframe column

Question

I want to get the first half of a string from a pandas dataframe column, where the length varies row by row. I have searched around and found questions like this but the solutions all focus on delimeters and regular expressions. I don't have a delimiter - I just want the first half of the string, however long it is.

I can get as far as specifying the string length I want:

import pandas as pd

eggs = pd.DataFrame({"id": [0, 1, 2, 3],
                     "text": ["eggs and spam", "green eggs and spam", "eggs and spam2", "green eggs"]})

eggs["half_length"] = eggs.text.str.len() // 2

and then I want to do something like eggs["truncated_text"] = eggs["text"].str[:eggs.half_length]. Or is defining this column the wrong way to go in the first place? Can anyone help?

what is your definition of first half, is and included in the count? if you have three words how would you define half? — Ade_1, May 23 '21 at 21:33

score 1 · Accepted Answer · answered May 23 '21 at 21:44

You can apply a function to text column:

import pandas as pd

eggs = pd.DataFrame({"id": [0, 1, 2, 3],
                     "text": ["eggs and spam", "green eggs and spam", "eggs and spam2", "green eggs"]})

eggs['truncated_text'] = eggs['text'].apply(lambda text: text[:len(text) // 2])

Output

|   id | text                | truncated_text   |
|-----:|:--------------------|:-----------------|
|    0 | eggs and spam       | eggs a           |
|    1 | green eggs and spam | green egg        |
|    2 | eggs and spam2      | eggs an          |
|    3 | green eggs          | green            |

dmm98 · Answer 2 · 2021-11-03T16:18:35.357

1

You can do this using vectorized operations, which is faster than the .apply method. I read this interesting article which explains vectorized operations more in-depth https://realpython.com/fast-flexible-pandas/

An example of using vectorized operations for strings can be found in the following post: Pandas make new column from string slice of another column

edited Nov 03 '21 at 16:18

answered Nov 03 '21 at 16:12

dmm98

101
1
3

Get first half of string from pandas dataframe column

2 Answers2