0

I'm trying to create a list from a string of comma delimited values in python using split(). I am observing when I do this my list appears to have multiple indexes that are the same, which appears to be because some of the values are the same. I'd like to have each element have its own sequential index, so I can use the index to access them positionally, how do I do this? Here is the code for context:

haproxy_socket_data ='''
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,
fe,FRONTEND,,,0,1,2000,45,0,8415,0,0,45,,,,,OPEN,,,,,,,,,1,1,0,,,,0,0,0,1,,,,0,0,0,45,0,0,,0,1,45,,,
bend,host1,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,113,0,,1,2,1,,0,,2,0,,0,L4OK,,0,0,0,0,0,0,0,0,,,,0,0,
'''
haproxy_socket_data = haproxy_socket_data.splitlines()

for line in haproxy_socket_data:
    stats = line.split(',')
    print line
    print stats
    for i in stats:
        print i
        print "index: %s" % stats.index(i)

Here is the output of this code: https://gist.github.com/wjimenez5271/74df2b16b540a7d9de0c

I found these examples of how do get this data into a list, but none of them addressed my situation where some values are the same:

How can I split this comma-delimited string in Python?

How to convert comma-delimited string to list in Python?

Community
  • 1
  • 1
wjimenez5271
  • 2,027
  • 2
  • 17
  • 24
  • See here: http://www.stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order – mhlester May 29 '14 at 17:15
  • 2
    `index` is the wrong function, it returns the first index of the element provided (in this case `i`). If you want to iterate over the list while getting this sequential index you are looking for you can use `enumerate`. – filmor May 29 '14 at 17:17
  • Do you just want to convert all this into a flat list? If so, you could just re.split(r'[\n\,]+', haproxy_socket_data) – Adam May 29 '14 at 17:18
  • @filmor yes that seems to be an issue, thanks. – wjimenez5271 May 29 '14 at 17:29

4 Answers4

3

You're misunderstanding what index() does. The Python documentation says:

s.index(x[, i[, j]])

index of the first occurrence of x in s (at or after index i and before index j)

So, each time you call stats.index(i) in your code, the index of first occurrence of i in stats will be returned.

If you want to keep track of the index of elements of a list as you iterate over it, you want enumerate():

for index, item in enumerate(stats):
    print item
    print "index: %s" % index
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
1

If data values have a comma in them, then the straightforward split(",") won't be correct.

Check the csv module. It supports figuring out ("sniffing") the proper split and quote parameters. It also lets you read each row of data into a dictionary, so you can refer to data by name. No more column counting!

Example. Note the backslash, so the sniffer can read header from the first line of data:

haproxy_socket_data ='''\
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,
fe,FRONTEND,,,0,1,2000,45,0,8415,0,0,45,,,,,OPEN,,,,,,,,,1,1,0,,,,0,0,0,1,,,,0,0,0,45,0,0,,0,1,45,,,
bend,host1,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,113,0,,1,2,1,,0,,2,0,,0,L4OK,,0,0,0,0,0,0,0,0,,,,0,0,
'''

import csv, StringIO

dialect = csv.Sniffer().sniff(haproxy_socket_data)

reader = csv.reader( 
    StringIO.StringIO(haproxy_socket_data), dialect=dialect,
    )
for row in reader:
    print row

print

dictr = csv.DictReader( 
    StringIO.StringIO(haproxy_socket_data),
    dialect=dialect,
    )
for drow in dictr:
    print 'svname',drow['svname']

Output:

['pxname', 'svname', 'qcur', 'qmax', 'scur', 'smax', 'slim', 'stot', 'bin', 'bout', 'dreq', 'dresp', 'ereq', 'econ', 'eresp', 'wretr', 'wredis', 'status', 'weight', 'act', 'bck', 'chkfail', 'chkdown', 'lastchg', 'downtime', 'qlimit', 'pid', 'iid', 'sid', 'throttle', 'lbtot', 'tracked', 'type', 'rate', 'rate_lim', 'rate_max', 'check_status', 'check_code', 'check_duration', 'hrsp_1xx', 'hrsp_2xx', 'hrsp_3xx', 'hrsp_4xx', 'hrsp_5xx', 'hrsp_other', 'hanafail', 'req_rate', 'req_rate_max', 'req_tot', 'cli_abrt', 'srv_abrt', ''] ['fe', 'FRONTEND', '', '', '0', '1', '2000', '45', '0', '8415', '0', '0', '45', '', '', '', '', 'OPEN', '', '', '', '', '', '', '', '', '1', '1', '0', '', '', '', '0', '0', '0', '1', '', '', '', '0', '0', '0', '45', '0', '0', '', '0', '1', '45', '', '', ''] ['bend', 'host1', '0', '0', '0', '0', '', '0', '0', '0', '', '0', '', '0', '0', '0', '0', 'UP', '1', '1', '0', '0', '0', '113', '0', '', '1', '2', '1', '', '0', '', '2', '0', '', '0', 'L4OK', '', '0', '0', '0', '0', '0', '0', '0', '0', '', '', '', '0', '0', '']

svname FRONTEND svname host1

johntellsall
  • 14,394
  • 4
  • 46
  • 40
1

The reason why it seems like you have duplicate indices, but actually list.index() in python will return the first occurance of that value. Try using a for loop that indexes them individually rather than a for in that inherently uses an iterator.

Zac Lozano
  • 778
  • 1
  • 4
  • 12
1

If you want to keep the index, use a for loop with enumerate, or with range():

haproxy_socket_data = """
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,
fe,FRONTEND,,,0,1,2000,45,0,8415,0,0,45,,,,,OPEN,,,,,,,,,1,1,0,,,,0,0,0,1,,,,0,0,0,45,0,0,,0,1,45,,,
bend,
"""
haproxy_socket_data = haproxy_socket_data.splitlines()
for line in haproxy_socket_data:
    stats = [item for item in line.split(',') if len(item) >= 1] #Gets rid of items like ['']
    print line
    print stats
    for ind, it in enumerate(stats):
        print it
        print "index: %d" % ind

Or, use range(len()):

haproxy_socket_data ="""
    pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,
    fe,FRONTEND,,,0,1,2000,45,0,8415,0,0,45,,,,,OPEN,,,,,,,,,1,1,0,,,,0,0,0,1,,,,0,0,0,45,0,0,,0,1,45,,,
    bend,
    """
haproxy_socket_data = haproxy_socket_data.splitlines()
for line in haproxy_socket_data:
    stats = [item for item in line.split(',') if len(item) >= 1] #Gets rid of items like ['']
    print line
    print stats
    for i in range(len(stats):
        print stats[i]
        print "index: %d" % i

list.index() returns the first occurrence of the item:

>>> item = [1, 2, 5, 7, 3, 3, 8, 9, 5]
>>> item.index(5)
2
>>> item[2]
5
>>> item[8]
5
>>> 

Using enumerate():

>>> for ind, it in enumerate(item):
...     if it == 5:
...             print ind
... 
2
8
>>> 
A.J. Uppal
  • 19,117
  • 6
  • 45
  • 76