0

I am new to python excel automation. Please help me to extract specific numbers present in amidst different characters in columns.

Actual DATA 

                Column A
     kDGK~202287653976 ~LD ~ 8904567
     SIP~12335678 ~202267858245~LD~8936272
     SIN112592~ LD ~ SIN112592
     0194X0322 ~ LD ~ 202243296291
     

Expected Output

                Column B
             202287653976
             202267858245
                  -
             202243296291
     

I want to extract 12 digits which starts from "2022" and which doesn't have that should leave a blank cell. Simple condition but I am not able to figure how to do it.

Thanks in advance.

  • Would be a rather simple [formula in Excel](https://stackoverflow.com/q/61837696/9758194) itself. If with python and dataframes I'm thinking about [regular expressions to extract the substring](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html). – JvdV Apr 08 '22 at 10:23

1 Answers1

0

To extract your number in Python just use a regex:

import re
col_a = 'kDGK~202287653976 ~LD ~ 8904567'
match = re.search(r'(2022\d+)', col_a)
if match:
    col_b = match[0]

match will be None if nothing is found, or a "match object" - in which case `match[0]' will be your desired number

EDIT

The above regex will match "2022" followed by any number of digits. If you want exactly 8 digits after "2022" you should write re.search(r'(2022\d{8})', col_a) instead.

EDIT 2

Assuming you are using openpyxl the full code will be something like

from openpyxl import load_workbook
import re

wb = load_workbook('somefile.xlsx')
ws=wb.active

for row in range(len(ws['A'])): #number of used cells in column A
    match = re.search(r'(2022\d+)', ws.cell(row,1).value
    if match:
        ws.cell(row,2).value = match[0]
gimix
  • 3,431
  • 2
  • 5
  • 21