1

I apologize that this question is somewhat vague, I'm very new to Python...

I need to parse a tab delimited text file. It's a very large file and from it I am trying to identify and extract specific things. For example, if one line was:

[apple banana cherry date] I want to search and identify the term "apple" and then extract the term "date".

Then, I need to acces the list of extracted terms and use them (for comparisons with other lists, etc.)

I have read about Regular Expressions, but while that seems to be good for searching, I don't know how to use it to extract terms other than the searched key word..Also, I'm not sure how to access/manipulate the array of results after parsing..

Any help/direction/pointers/suggestions/examples would be amazing.

Thank you so much!

  • Please break your problems to smaller problems and then try to search for answers. – Ansuman Bebarta Jul 03 '13 at 08:03
  • possible duplicate of [parsing a tab-separated file in Python](http://stackoverflow.com/questions/11059390/parsing-a-tab-separated-file-in-python) – RedBaron Jul 03 '13 at 08:04

2 Answers2

3

If a file is tab delimited, it's usually a sign for using csv module:

>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
...     reader = csv.reader(csvfile, dialect=csv.excel_tab)
...     for row in reader:
...         print row

It's hard to say more without any specific example.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
1

http://docs.python.org/2/library/re.html

Here's a simple example:

import re
# This regular expression detects base-64 encoded images
regex = '(?P<src>data:image/png;base64, (?<image>[^"]*))'
# you can then either
# a)
matches = re.findall(regex, your_input_string)
for m in matches:
    # address your matches with index notation
    src = m[0]
    data = m[1]
# b)
src = re.search(regex, your_input_string).group('src')
data = re.search(regex, your_input_string).group('data')
Lorcan O'Neill
  • 3,303
  • 1
  • 25
  • 24
  • 1
    has the csv module offended you in some way? – LtWorf Jul 03 '13 at 08:07
  • I've never heard of the csv module before.Also OP said he didn't know how to use regular expressions to extract terms. Since I did know something about that, that was the answer I contributed. – Lorcan O'Neill Jul 03 '13 at 08:21