-1

Actually it should be quite simple. I have a pd series bar['Barcode'] where I want to get filter eans (barcodes with 12, 13 or 14 digits) from. Using Regex i'm appending to a new list within a loop. How do I at the same time delete the rows from the original series?

bar = pd.read_csv("barcode.csv", header=0, sep=';', engine='python')

ean = []
for i in bar['Barcode']:
    x = re.search("\d{12,14}", i)
    if(x):
        ean.append(x.group())
        #bar.drop(bar['Barcode']==x.string, inplace=True)
print(ean)

The problem comes with the line that I commented out. This is not the right way to do it, but I don't know how what else is possible. Could you help me delete the rows?

Thanks in advance!

SuperKogito
  • 2,998
  • 3
  • 16
  • 37
jlotte
  • 1
  • 2

1 Answers1

0

I'd just accumulate everything to a list and drop afterwards, mutating an object while you're iterating it is asking for trouble!

to start with, make it into a MWE:

import re
import pandas as pd

df = pd.DataFrame(
    [(i, '1' * i) for i in range(10, 17)],
    columns=['i', 'barcode']
)

which gives us a simple dataframe with two columns, we can then go the verbose route of defining a function to do the matching and applying this to the column:

def match(s):
    m = re.match(r'^\d{12,14}$', s)
    if m:
        return m.group()

df['match'] = df['barcode'].apply(match)

note I use a r at the beginning of the string to turn off escaping, and use ^ and $ to match beginning and end of strings.

you can then use this to filter the dataframe:

df[~df['match'].isnull()]

which gives us our three rows back that match.

if you want a one-liner and don't care about the matched string, you could do:

df[df['barcode'].apply(lambda s: re.match(r'^\d{12,14}$', s) is not None)]

but I'd say code like this is bordering on the unreadable

Sam Mason
  • 15,216
  • 1
  • 41
  • 60