Mapping AlphaNumeric Strings using Python

Question

I have a dataset of names. Based on the alphanumeric strings of name,I need to map them to Subname as given below.

Name            Subname
9-AIF-09        9A09
980-PD-Z09A     980P09
15-KIC-12       15K12
PIA-110H        P-110
IC009A          I009A

There can be defined rules like, if 'A' is present in name then keep all digits and alphabet 'A', 'P' is in the name then only 'P' is carried forward. Patterns must be identified by the algorithm itself about how a mapping is done.

Is there any algorithm I can use to identify patterns from training dataset to further predict.

Very interesting question! Sadly, search engines are helpless to find whether someone already tackled this problem. They keep returning pages about pattern-matching, not about pattern inferring. — Stef, Mar 29 '22 at 10:13
This is somewhat related: [Grammatical inference of regular expressions for given finite list of representative strings?](https://stackoverflow.com/questions/15512918/grammatical-inference-of-regular-expressions-for-given-finite-list-of-representa) — Stef, Mar 29 '22 at 10:14

score 1 · Answer 1 · answered Mar 27 '22 at 17:31

1

I see two options.

getting 3 groups (before first letter, 1st letter, after 1st letter) and removing all non digits in groups 1 and 3:

import re
df['Subname'] = df['Name'].str.replace(r'([^a-zA-Z]+)([a-zA-Z])(.*)',
                                       lambda m: (re.sub('\D', '', m.group(1))
                                                  +m.group(2)
                                                  +re.sub('\D', '', m.group(3))),
                                      regex=True)

Or, defining a pattern: non-digits/digits/non-digits/letter/non-digits/digits/non-digits:

df['Subname'] = (df['Name'].str.extract(r'\D*(\d+)[^\da-zA-Z]*([a-zA-Z])\D*(\d+)')
                           .agg(''.join, axis=1)
                 )

output

          Name Subname
0     9-AIF-09    9A09
1  980-PD-Z09A  980P09
2    15-KIC-12   15K12

answered Mar 27 '22 at 17:31

mozway

194,879
13
39
75

1

Is there any ML way to get the pattern recognized itself based on training data? – spd Mar 28 '22 at 12:54
Or a NLP approach @mozway – spd Mar 28 '22 at 17:31
please go through this https://stackoverflow.com/q/71577325/17778275 – spd Apr 02 '22 at 07:29
Is this question solved? – mozway Apr 02 '22 at 07:59
This doesnt solve the exact purpose of identifying a pattern itself. – spd Apr 02 '22 at 08:03
@spd then please make the question more explicit. Provide other examples, describe the logic, etc. – mozway Apr 02 '22 at 08:32
Please check the edited question. The logic of mapping can be different everytime, so I needed a ML//NLP way to identify these patterns. – spd Apr 06 '22 at 14:38
Can I use any Context Free Language approach – spd Apr 06 '22 at 18:09
Sorry, the requirements are too vague to me – mozway Apr 06 '22 at 18:14

Mapping AlphaNumeric Strings using Python

1 Answers1

Linked