0

I'm trying to generate a code for each row of a dataframe, such that, the first 2 characters are codes for a category, the next 3 characters are codes for a sub-category, and the last 3 numbers are iterative numbers for that sub-category, for example -

CTSCT001
CTSCT002
CTSCT003

My code looks like this -

for sub_cat in dict_subcat_codes.keys():
    i=1
    for index, row in df[['Sub Category'] == sub_cat].iterrows():
        row['Code'] = dict_cat_codes[row['Category']]+dict_subcat_codes[row['Sub Category']]+f'{i:03}'
        i+=1

When I debug the code by printing the generated codes, they seem to print fine, but they are not being assigned to the Code column of the df. Am I doing something wrong? Is their a better way to deal with this?

harry04
  • 900
  • 2
  • 9
  • 21
  • `row` is not attached (is not a reference) to the dataframe. Updating `row` will not affect the DataFrame. – Henry Ecker Jul 15 '21 at 03:34
  • 1
    If you need to affect the dataframe you'll need to use [loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) on `df` to do the assignment -> `df.loc[index, 'Code'] = ...` – Henry Ecker Jul 15 '21 at 03:35
  • 1
    Does this answer your question? [Updating value in iterrow for pandas](https://stackoverflow.com/questions/25478528/updating-value-in-iterrow-for-pandas) – Henry Ecker Jul 15 '21 at 03:37

2 Answers2

0

I made assumptions Text may not be repetitively known as in sample. It is likely to change.If it isn't, better use named groups. Two options below:

If hardcoding parts of the text won't work, then use:

df=df.assign(code=df['Text'].str.extract('(^\D{2})'),subcode=df['Text'].str.extract('(\D{3}(?=\d))'),numbers=df['Text'].str.extract('(\d+$)'))

named groups

df['Text'].str.extract(r'(?P<code>^[CT]+)(?P<subcode>[SCT]+)(?P<numbers>\d+$)')

All the options result into:

   code subcode numbers
0   CT  SCT     001
1   CT  SCT     002
2   CT  SCT     003
wwnde
  • 26,119
  • 6
  • 18
  • 32
0

Based on Henry's comment, I just changed the row['Code'] to make it work for me -

for sub_cat in dict_subcat_codes.keys():
    i=1
    for index, row in df[['Sub Category'] == sub_cat].iterrows():
        df.loc[index, 'Code'] = dict_cat_codes[row['Category']]+dict_subcat_codes[row['Sub Category']]+f'{i:03}'
        i+=1
harry04
  • 900
  • 2
  • 9
  • 21