-2

I have a tab-delimited txt file like this:

A   B   aaaKP
C   D   bbbZ

This is tab-delimited.

If

phrase = aaa
column = 3

then I would like only those rows whose 3rd column starts with aaa

The output will be

A   B   aaaKP

What I have tried is quite tedious, and I only tried MatLab.

I could try it only by using very slow if and for statements and findstr.

I could find no way better using MatLab.

user3123767
  • 1,115
  • 3
  • 13
  • 22
  • 4
    That's nice. I prefer python over Matlab as well. How about showing us what you've tried in python? – Gerrat Jul 24 '14 at 17:45

2 Answers2

1
phrase, column = 'aaa', 3
fn = lambda l : len(l) >= column and len(l[column-1]) >= len(phrase) and phrase == l[column-1][:len(phrase)]
fp = open('output.txt', 'w')
fp.write(''.join(row for row in open('input.txt') if fn(row.split('\t'))))
fp.close()
Corei13
  • 401
  • 2
  • 9
  • Thank you! It prints the output correctly. Can I save the output as output.txt ? – user3123767 Jul 24 '14 at 18:09
  • Thank you it works! What if there are multiple phrases? For example what if I want to get rows whose 3rd column is either aaa or bbb or ccc ? I tried "phrase, column = {'aaa','bbb','ccc'}, 3" but it didn't work. – user3123767 Jul 25 '14 at 04:41
0

The simplest method I can think of:

with open('tab-delimited.txt', 'r') as f:
    for l in f:
       if l.split('\t')[2][:3] == 'aaa':
            print(l)

If you need help understanding python slicing, see this question.

Community
  • 1
  • 1
dwitvliet
  • 7,242
  • 7
  • 36
  • 62