inconsistent indentation with Python after split

Question

edit in progress will re-submit sometimes later edit in progress will re-submit sometimes later edit in progress will re-submit sometimes later

>>> _Good Evening everyone_ Somewhere is evening, somewhere midnight and yet somewhere midday :) — Op De Cirkel, Jul 13 '11 at 04:03
I'm a little confused... why are you asking about the Python code, when you say you're having problems with the MySQL import? It seems to me that the Python does exactly what it's supposed to do (although you haven't completely specified what exactly the Python script is supposed to produce). — David Z, Jul 13 '11 at 04:16
@David Zaslavsky: With the current Python Code the result file has some inconsistent indentation. If I wanna import into a MySQL table I need to have a perfect indentation with a hard tab between each column. The result file with Python should be a four column text file with hard tab between each column :) — madkitty, Jul 13 '11 at 04:24
@madkitty: hm, well that's what I thought I saw when I looked at the sample results you posted. Anyway it seems that you have some answers now. — David Z, Jul 13 '11 at 04:28

Fergus Barker · Answer 1 · 2011-07-13T04:29:45.613

0

You can use .strip() to remove any whitespace around an item before entering it. This would allow a bit more clarity and solve any indentation issues.

For example:

b=a.split('chr').strip() # No white space either side now
c=b[1].split(':').strip() # No white space
d=c[1].split('..').strip()
e=b[0]+'\t'+c[0]+'\t'+d[0]+'\t'+d[1]+'\t'+'\n'
rfh.write(e)

What this will have done is remove any existing whitespace, and let only your \t's exist.

edited Jul 13 '11 at 04:29

answered Jul 13 '11 at 04:10

Fergus Barker

1,282
6
16
26

okay but now I'm not sure how to use .strip().. can you show me a quick example? :) – madkitty Jul 13 '11 at 04:17

techiev2 · Answer 2 · 2011-07-13T09:06:30.450

0

Why not use a regex split ?

import re
with open(<infile>) as inf:
    for annot_info in f:
        split_array = re.split(r'(\W+)(chr\w+):(\d+)..(\d+)', annot_info)
        #do your sql processing here.
        #write out to a file if you wish to.

would give you ['', '+', 'chr6', '140302505', '140302604', '']. You can use the same in your current mysql methods.

PS: The regex pattern I've used would give you empty strings at the beginning and end. Modify the regex or change your sql insert to exclude first and last elements of array while pushing.

edited Jul 13 '11 at 09:06

answered Jul 13 '11 at 04:25

techiev2

298
1
8

oh I like the idea. But the input is a whole file of 60GB. Should I write annot_info = input_file_name.txt ?? – madkitty Jul 13 '11 at 05:53
I used annot_info as a placeholder variable for a line. Modified the code in the example just to be sure I don't sound ambiguous. :-) – techiev2 Jul 13 '11 at 09:07
I tried this but it returns an error message `with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f: TypeError: coercing to Unicode: need string or buffer, file found` `# Opens each file to read/modify infile=open('110331_HS1A_1_rtTA.result','r') outfile=open('2.txt','w') import re with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f: f = (i for i in in_f if i.rstrip())` etc... same as you posted – madkitty Jul 13 '11 at 13:47

JBernardo · Accepted Answer · 2011-07-13T04:34:34.693

0

That should work:

import re #Regex may be the easiest way to split that line

with open(infile) as in_f, open(outfile,'w') as out_f:
    f = (i for i in in_f if i.rstrip()) #iterate over non empty lines
    for line in f:
        _, k = line.split('\t', 1)
        x = re.findall(r'^1..100\t([+-])chr(\d+):(\d+)\.\.(\d+).+$',k)
        if not x:
            continue
        out_f.write(' '.join(x[0]) + '\n')

edited Jul 13 '11 at 04:34

answered Jul 13 '11 at 04:26

JBernardo

32,262
10
90
115

// Regex may be the easiest way to split that line // Sure is, such a life saver. :-) – techiev2 Jul 13 '11 at 04:28

inconsistent indentation with Python after split

3 Answers3

Linked