edit in progress will re-submit sometimes later edit in progress will re-submit sometimes later edit in progress will re-submit sometimes later
Asked
Active
Viewed 661 times
1
-
>>> _Good Evening everyone_ Somewhere is evening, somewhere midnight and yet somewhere midday :) – Op De Cirkel Jul 13 '11 at 04:03
-
I'm a little confused... why are you asking about the Python code, when you say you're having problems with the MySQL import? It seems to me that the Python does exactly what it's supposed to do (although you haven't completely specified what exactly the Python script is supposed to produce). – David Z Jul 13 '11 at 04:16
-
@David Zaslavsky: With the current Python Code the result file has some inconsistent indentation. If I wanna import into a MySQL table I need to have a perfect indentation with a hard tab between each column. The result file with Python should be a four column text file with hard tab between each column :) – madkitty Jul 13 '11 at 04:24
-
@madkitty: hm, well that's what I thought I saw when I looked at the sample results you posted. Anyway it seems that you have some answers now. – David Z Jul 13 '11 at 04:28
3 Answers
0
You can use .strip()
to remove any whitespace around an item before entering it. This would allow a bit more clarity and solve any indentation issues.
For example:
b=a.split('chr').strip() # No white space either side now
c=b[1].split(':').strip() # No white space
d=c[1].split('..').strip()
e=b[0]+'\t'+c[0]+'\t'+d[0]+'\t'+d[1]+'\t'+'\n'
rfh.write(e)
What this will have done is remove any existing whitespace, and let only your \t
's exist.

Fergus Barker
- 1,282
- 6
- 16
- 26
-
okay but now I'm not sure how to use .strip().. can you show me a quick example? :) – madkitty Jul 13 '11 at 04:17
0
Why not use a regex split ?
import re
with open(<infile>) as inf:
for annot_info in f:
split_array = re.split(r'(\W+)(chr\w+):(\d+)..(\d+)', annot_info)
#do your sql processing here.
#write out to a file if you wish to.
would give you ['', '+', 'chr6', '140302505', '140302604', '']. You can use the same in your current mysql methods.
PS: The regex pattern I've used would give you empty strings at the beginning and end. Modify the regex or change your sql insert to exclude first and last elements of array while pushing.

techiev2
- 298
- 1
- 8
-
oh I like the idea. But the input is a whole file of 60GB. Should I write annot_info = input_file_name.txt ?? – madkitty Jul 13 '11 at 05:53
-
I used annot_info as a placeholder variable for a line. Modified the code in the example just to be sure I don't sound ambiguous. :-) – techiev2 Jul 13 '11 at 09:07
-
I tried this but it returns an error message `with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f: TypeError: coercing to Unicode: need string or buffer, file found` `# Opens each file to read/modify infile=open('110331_HS1A_1_rtTA.result','r') outfile=open('2.txt','w') import re with open (infile, mode='r', buffering=-1) as in_f, open (outfile, mode='w', buffering=-1) as out_f: f = (i for i in in_f if i.rstrip())` etc... same as you posted – madkitty Jul 13 '11 at 13:47
0
That should work:
import re #Regex may be the easiest way to split that line
with open(infile) as in_f, open(outfile,'w') as out_f:
f = (i for i in in_f if i.rstrip()) #iterate over non empty lines
for line in f:
_, k = line.split('\t', 1)
x = re.findall(r'^1..100\t([+-])chr(\d+):(\d+)\.\.(\d+).+$',k)
if not x:
continue
out_f.write(' '.join(x[0]) + '\n')

JBernardo
- 32,262
- 10
- 90
- 115
-
// Regex may be the easiest way to split that line // Sure is, such a life saver. :-) – techiev2 Jul 13 '11 at 04:28