Finding index of multiple variation of a string

Question

I need to get the index of first name and account number of a csv file.

so some files might look like this

data = ['account number', 'first name']
or
data = ['account #', 'First Name']
or
data = ['ACCOUNT NUMBER', 'FIRST NAME'] etc.
or
data = ['...',.....,'account num',...,'firstname']

So from what I found so far(sackoverflow), i can use l.index('first name') to get the index. Also reading the def(python tutorial) it seem to take only one parameter.

Any idea how I can check for index if it's any of those?

Can you explain more about your problem? maybe an expected output could help! — Mazdak, Apr 16 '15 at 13:37
Does each CSV file have the field names on the first line of the file? — PM 2Ring, Apr 16 '15 at 13:42
so I want the able to do something like data.index(['first name', 'FIRST NAME, 'firstname', 'First Name']) and get the index. So in first 3 i would get an index of 1, where in the fourth list i may get an index of i lets say for the sake of a value 4. and yes, the first line is all the headers of the csv file — rak1n, Apr 16 '15 at 13:44

Julien Spronck · Answer 1 · 2015-04-16T13:54:52.480

0

You can use a list comprehension:

idx = [i for i, item in enumerate(data) if item.lower() == 'first name']

or more generally:

alist = ['first name', 'first name'] ## or ['account number', 'account #', ...]
idx = [i for i, item in enumerate(data) if item.lower() in alist]

You can also use regular expressions for more complex cases:

import re
idx = [i for i, item in enumerate(data) if re.search(pattern, item)]

edited Apr 16 '15 at 13:54

answered Apr 16 '15 at 13:47

Julien Spronck

15,069
4
47
55

what about account number where it might say 'account #' maybe can throw in or condition in there? if thats possible? – rak1n Apr 16 '15 at 13:48

Mazdak · Answer 2 · 2015-04-16T14:04:18.317

0

You can use re.match within a list comprehension :

import re
indices = [i for i,s in enumerate(data) if re.match(r'^(account.*)|(first\s?name)$',s,re.I)]

The following regex :

r'^(account.*)|(first\s?name)$

will match any string that start with account or any string that start with first and an optional whitespace then name also it has a Ignorecase flag to ignore of the case of your string.

edited Apr 16 '15 at 14:04

answered Apr 16 '15 at 13:56

Mazdak

105,000
18
159
188

score 0 · Answer 3 · answered Apr 16 '15 at 14:17

Here's one way to do it, using sets. If no string matches the options for a field, then -1 is returned for its index, similar to str.find().

#!/usr/bin/env python

accnums = set(['account number', 'account #', 'account num', 'accnum'])
firstnames = set(['first name', 'firstname', '1stname'])

def find_fields(seq):
    accnum, firstname = (-1, -1)
    for i, field in enumerate(seq):
        field = field.lower()
        if field in accnums:
            accnum = i
        elif field in firstnames:
            firstname = i
    return accnum, firstname

testdata = [
    ['account number', 'first name'],
    ['account #', 'First Name'],
    ['ACCOUNT NUMBER', 'FIRST NAME'],
    ['accnum', '1stname'],
    ['country', 'lastname', 'account num', 'account type', 'firstname'],
    ['accnum', '1stname', 'account #'],
    ['albatross', 'first name'],
    ['Account Number', 'duck'],
]

for data in testdata:
    print data, find_fields(data)

output

['account number', 'first name'] (0, 1)
['account #', 'First Name'] (0, 1)
['ACCOUNT NUMBER', 'FIRST NAME'] (0, 1)
['accnum', '1stname'] (0, 1)
['country', 'lastname', 'account num', 'account type', 'firstname'] (2, 4)
['accnum', '1stname', 'account #'] (2, 1)
['albatross', 'first name'] (-1, 1)
['Account Number', 'duck'] (0, -1)

Note that if it finds multiple matching entries for a field it returns the index of the last matching field. Thus for ['accnum', '1stname', 'account #'] it returns 2 as the index for the account number field.

You can expand the if: ... elif: block in find_fields() to handle more fields with varying names, but if you have a lot of these fields then it would be better to modify the logic so that it's working with a list of sets rather than with individual sets.

Finding index of multiple variation of a string

3 Answers3