Python get String behind multiple comma

Question

The problem is i need to read a text.txt file and just get very specific data from it. the entries of that text.txt looks like this

b(1,4,8,1,4,TEST,0,3,AAAA,Test,2-150,000)
a(1,1,3,1,3,BBBB,0,3,BBBB,Test,2-150,000)
a(1,0,2,1,4,TEST,0,3,CCCC,Test,2-150,000)
b(1,1,0,1,4,TEST,0,3,DDDD,Test,2-150,000)

So now i just whant those lines with "a(" and in those i just need the sting after the 5 and 8 comma, so in line 2 it would be BBBB ,BBBB

my code so far is:

infile = open("text.txt","r") 
numlines = 0
found = []

for line in infile:
 numlines += 1
 if "a" in line:
  line=line[line.find("(")+1:line.find(")")]
  found.append(line.split(','))

wordLed=len(found)
for i in range(0,wordLed):
    print found[i]


infile.close()

This just gives me the full lines seperated at the "," but how can i index though them?

What exactly are you trying to parse? They almost look like function calls... could you elaborate? — Jon Clements, Aug 26 '14 at 14:12

Sylvain Leroux · Accepted Answer · 2014-08-26T14:26:33.237

4

The ~~quick~~ short and dirty:

with open('text.txt') as f:
    result = [line.split(',')[5:9:3] for line in f if line.startswith("a(")]
#                            ^^^^^^^
#                       "5 to 9 (excl.) by step of 3"
#                       that is items 5 and 5+3
#
#                       replace by [5] if you only want the fifths item
#                       replace by [5:9] if you want items from 5 to 9 (excl.) 

from pprint import pprint
pprint(result)

dirty because of the lack of error handling...

... anyway, given your sample data, this produces:

[['BBBB', 'BBBB'], ['TEST', 'CCCC']]

edited Aug 26 '14 at 14:26

answered Aug 26 '14 at 14:13

Sylvain Leroux

50,096
7
103
125

1

I like this solution the best, however, why 5:9:3... am I missing something? Nevermind... I understand.. and now I like it just a little bit less. :D – dss539 Aug 26 '14 at 14:17
@dss539 that's for selecting both values in one shot. – Raydel Miranda Aug 26 '14 at 14:19
@dss539 `[5:9:3]` means _"take values from 5 to 9 (excl.) by step of 3"_. Its a _trick_ to get item 5 and 5+3. – Sylvain Leroux Aug 26 '14 at 14:20
sounds nice and is it possible to than work with just 1 value of "result" like "CCCC" or do i than need to index the whole thing again ? Like print result[2,2] would than be CCCC – GEnGEr Aug 26 '14 at 14:21
@GEnGEr Dude, what you mean with just one value. Please elaborate you question. – Raydel Miranda Aug 26 '14 at 14:24
1

@GEnGEr This is the [slice notation](http://stackoverflow.com/questions/509211/pythons-slice-notation). Notice too that the result is a list of list. Given the code in the answer, if you only want to _display_ the column with `CCCC` you can write `for row in result: print(row[1])`. Feel free to *ask an other question* if you need more informations. – Sylvain Leroux Aug 26 '14 at 14:27
Ahh ok that http://stackoverflow.com/questions/509211/pythons-slice-notation sounds just like the stuff i need now. Yes it was intended to get a List of a list. – GEnGEr Aug 26 '14 at 14:32

score 3 · Answer 2 · answered Aug 26 '14 at 14:20

3

I would use readlines function:

with open("data.txt","r") as f:
    lines = f.readlines()
for line in lines:
    if line[0:2] == 'a(':
        data1 = line.split(',')[5]
        data2 = line.split(',')[8]
        print(data1, data2)       
f.close()

answered Aug 26 '14 at 14:20

Northern

2,338
1
17
20

If a line is empty (ie ""), `line[0:2]` will crash the program – Pphoenix Aug 26 '14 at 14:22
1

that's right. Error handling is necessary to make it robust. Or simply check length of line and splitted list before using `line[0:2]` and `line.split(',')[5]` and `line.split(',')[8]` – Northern Aug 26 '14 at 14:35

score 1 · Answer 3 · answered Aug 26 '14 at 14:08

You should check the full condition on start, i.e. a( instead of a. Also you could use split to create an array out of your string, based on ,:

infile = open("text.txt","r")

for line in infile:
 if line.startswith("a("): # Starts with a(
  data = line.split(',')
  print data[5] # Print data at place 5
  print data[8] # Print data at place 8

infile.close()

dss539 · Answer 4 · 2014-08-26T14:24:30.017

1

for line in [l for l infile if l.startswith('a(')]
    line = line[line.find('('):].strip('()\n').split(',')
    a_field, other_field = line[5], line[8]

You split the string already, just index into the list to get the fields you want.

edited Aug 26 '14 at 14:24

answered Aug 26 '14 at 14:11

dss539

6,804
2
34
64

@Raydel the OP had problems with extracting specific fields. I can include the filtering in my answer, too, I suppose – dss539 Aug 26 '14 at 14:13
Don't forget the `'\n'` that may end a line. A small modification can answer the question: `s[s.find('('):].strip('()\n').split(',')` – Taha Aug 26 '14 at 14:22
@Taha Added your suggestion. Thanks. – dss539 Aug 26 '14 at 14:24

Python get String behind multiple comma

4 Answers4