0

Excel function ->

=IFERROR(MID(E14330,FIND("-0",E14330)-3,9),"None")

I have tried the following and it works but is very slow. I have also tried using lambda function but that didn't help either (got syntax error)

get_JournalData_05['PO'] = ""
    
for i, row in get_JournalData_05.iterrows():
    if get_JournalData_05['zero index'].iloc[i] == -1:
        get_JournalData_05['PO'] = 'None'
    else:
        get_JournalData_05['PO'] = get_JournalData_05['Description'].iloc[i][int(get_JournalData_05['zero index'].iloc[i] - 3):int(get_JournalData_05['zero index'].iloc[i] + 9)]

get_JournalData_05['zero index'] is the column with the index I found using:

get_JournalData_05['zero index'] = get_JournalData_05['Description'].str.find('-0')

I need to fill the PO column based on extraction from the Description column.

ouroboros1
  • 9,113
  • 3
  • 7
  • 26
  • I see you are using iterrows, which get you the index & row info as well, so you can use row['zero_index'] instead of get_JournalData_05['zero index'].iloc[i] The same thing applies for every column inside the loop. Check is this works for you. Also you are using full index assignment : get_JournalData_05['PO'] = 'None' This will assign the value to entire dataframe instead of just that specific row. Use get_JournalData_05.at[i,'PO'] = 'None' – Asetti sri harsha Jul 21 '23 at 13:33

2 Answers2

0

Instead of using iterrows, look for ways to vectorize operations (see this SO post). In your case, we can use Series.str.extract:

Excel

Excel

Pandas

import pandas as pd

data = {'Description': ['deleteABC-01234delete', 'deleteABC-012',
                        'failABC_012fail', 'BC-012fail']}

get_JournalData_05 = pd.DataFrame(data)

get_JournalData_05['PO'] = get_JournalData_05['Description'].str.extract(r'(.{3}-.{,5})')

get_JournalData_05

             Description         PO
0  deleteABC-01234delete  ABC-01234
1          deleteABC-012    ABC-012
2        failABC_012fail        NaN  <- no `-` character found
3             BC-012fail        NaN  <- only 2 characters preceding `-`

For an explanation of the regular expression, see here. Failures to extract the pattern will lead to NaN values (index 2 & 3). If you insist on the string "None", you could chain .fillna('None').

ouroboros1
  • 9,113
  • 3
  • 7
  • 26
0

I was able to solve the problem using this function:

def extract_substring(input_string):
    try:
        index = input_string.find("-0")
        if index != -1:
            return input_string[index - 3:index + 6]
        else:
            return "None"
    except AttributeError:
        return "None"

This approach is way faster than iterating through each row.

doneforaiur
  • 1,308
  • 7
  • 14
  • 21