-2

My case is extract number between text (ex: FL-number-$) from string in File names column to Check column. Example:

  1. 2022-06-09-FR-Echecks.pdf > Return ''
  2. 2022-06-09-FR-FL-3-$797.pdf > Return 3
  3. 2022-06-09-FR-TX-20-$35149.91.pdf > Return 20

My case as below

enter image description here

This code I used:

dt_test['File_names_page'] = dt_test['File names'].str.extract('\-([0-99])-\$')

It only return one digit number as below:

enter image description here

So how to extract all number (all digit) in my case?

Tks for all attention!

Mr.Bean
  • 65
  • 6
  • 1
    `[0-99]` = `[0-9]` = `[90-9]` = `[996994992999999999990-999991999999]` - all match just one occurrence of a digit. `[0-9]+` = one or more digits, `[0-9]{2}` - two digits, `[0-9]{1,2}` - one or two digits, `[0-9]{1,3}` - one, two or three digits, etc. – Wiktor Stribiżew Oct 10 '22 at 09:40
  • Tks Wiktor, it work for my case! – Mr.Bean Oct 10 '22 at 10:22

2 Answers2

0

You can't use a 0-99 range, you should use \d{1,2} for one or two digits:

dt_test['File_names_page'] = dt_test['File names'].str.extract(r'-(\d{1,2})-\$')

Or for any number of digits (at least 1) \d+:

dt_test['File_names_page'] = dt_test['File names'].str.extract(r'-(\d+)-\$')

NB. - doesn't require an escape

Example:

    File names File_names_page
0  ABC-12-$456              12
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Your regex pattern is slightly off. Just use \d+ to match any integer number:

dt_test["File_names_page"] = dt_test["File names"].str.extract(r'-(\d+)-\$')
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360