2

I have a variety of values in a text field of a CSV

Some values look something like this AGM00BALDWIN AGM00BOUCK

however, some have duplicates, changing the names to AGM00BOUCK01 AGM00COBDEN01 AGM00COBDEN02

My goal is to write a specific ID to values NOT containing a numeric suffix

Here is the code so far

prov_count = 3000
prov_ID = 0
items = (name, x, y)
xy_tup = tuple(items)

if "*1" not in name and "*2" not in name:
    prov_ID = prov_count + 1
else:
prov_ID = ""

It seems that the the wildcard isn't the appropriate method here but I can't seem to find an appropriate solution.

PolyGeo
  • 1,340
  • 3
  • 26
  • 59
Whitty21
  • 23
  • 3
  • You can check the last 2 characters using `name[-2:]`. What is the max number of duplicates in your data? – Farhan.K Oct 25 '16 at 15:08
  • 1
    Is it possible that a non-duplicate ends with a number, or if you see things like `01`, `02`... at the end of the name, that will unequivocally mean that it's a duplicate? – Savir Oct 25 '16 at 15:09
  • try with name.endswith(("1", "2")) – rnbguy Oct 25 '16 at 15:12
  • @Farhan.K I cant see any more than 2 digits worth of duplicates – Whitty21 Oct 25 '16 at 15:13
  • `"*1"` is not a wildcard, it is checking the literal character `"*"` if you want to do wildcard see [this thread](http://stackoverflow.com/questions/11427138/python-wildcard-search-in-string) – Tadhg McDonald-Jensen Oct 25 '16 at 15:20
  • @Whitty21 That is going to confuse future readers. You should leave your original code that everyone provided answers for – Farhan.K Oct 25 '16 at 15:25
  • @Farhan.K I didnt think about that, and have restored it – Whitty21 Oct 25 '16 at 15:28

3 Answers3

1

There are different ways to do it, one with the isdigit function:

a = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]

for i in a:
  if i[-1].isdigit():  # can use i[-1] and i[-2] for both numbers
    print (i)


Using regex:
import re
a = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]

pat = re.compile(r"^.*\d$")  # can use "\d\d" instead of "\d" for 2 numbers
for i in a:
  if pat.match(i): print (i)

another:

for i in a:
    if name[-1:] in map(str, range(10)): print (i)

all above methods return inputs with numeric suffix:

AGM00BOUCK01
AGM00COBDEN01
AGM00COBDEN02
lycuid
  • 2,555
  • 1
  • 18
  • 28
1

Using regular expressions seems appropriate here:

import re

pattern= re.compile(r'(\d+$)')

prov_count = 3000
prov_ID = 0
items = (name, x, y)
xy_tup = tuple(items)

if pattern.match(name)==False:
    prov_ID = prov_count + 1
else:
    prov_ID = ""
A.Kot
  • 7,615
  • 2
  • 22
  • 24
0

You can use slicing to find the last 2 characters of the element and then check if it ends with '01' or '02':

l = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]

for i in l:
    if i[-2:] in ('01', '02'):
        print('{} is a duplicate'.format(i))

Output:

AGM00BOUCK01 is a duplicate
AGM00COBDEN01 is a duplicate
AGM00COBDEN02 is a duplicate

Or another way would be using the str.endswith method:

l = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"]

for i in l:
    if i.endswith('01') or i.endswith('02'):
        print('{} is a duplicate'.format(i))

So your code would look like this:

prov_count = 3000
prov_ID = 0
items = (name, x, y)
xy_tup = tuple(items)

if name[-2] in ('01', '02'):
    prov_ID = prov_count + 1
else:
    prov_ID = ""
Farhan.K
  • 3,425
  • 2
  • 15
  • 26
  • I simplified it as a test and this isn't working, though it looks like what I was thinking in my head – Whitty21 Oct 25 '16 at 15:47
  • name = ["AGM00BALDWIN", "AGM00BOUCK", "AGM00BOUCK01", "AGM00COBDEN01", "AGM00COBDEN02"] for i in name: if name[-2] in ('01', '02'): print "bad" else: print "good" – Whitty21 Oct 25 '16 at 15:49
  • @Whitty21 which part doesn't work? Do you get an error message? the snippet you posted in your second comment should work fine – Farhan.K Oct 25 '16 at 15:56
  • I have got it working using an unsimplifeid approach. Now I just seem to have an issue with my loop, as every value is being written as null – Whitty21 Oct 25 '16 at 16:42