6

I have a text file text_isbn with loads of ISBN in it. I want to write a script to parse it and write it to a new text file with each ISBN number in a new line.

Thus far I could write the regular expression for finding the ISBN, but could not process any further:

import re
list = open("text_isbn", "r")
regex = re.compile('(?:[0-9]{3}-)?[0-9]{1,5}-[0-9]{1,7}-[0-9]{1,6}-[0-9]')

I tried to use the following but got an error (I guess the list is not in proper format...)

parsed = regex.findall(list)

How to do the parsing and write it to a new file (output.txt)?

Here is a sample of the text in text_isbn

Praxisguide Wissensmanagement - 978-3-540-46225-5
Programmiersprachen - 978-3-8274-2851-6
Effizient im Studium - 978-3-8348-8108-3
mcbetz
  • 2,329
  • 4
  • 20
  • 30

2 Answers2

8

How about

import re

isbn = re.compile("(?:[0-9]{3}-)?[0-9]{1,5}-[0-9]{1,7}-[0-9]{1,6}-[0-9]")

matches = []

with open("text_isbn") as isbn_lines:
    for line in isbn_lines:
        matches.extend(isbn.findall(line))
Jakob Bowyer
  • 33,878
  • 8
  • 76
  • 91
  • 1
    With the regex expression taken from here: http://stackoverflow.com/questions/4381514/regular-expression-for-an-isbn-13 – Tim Jan 10 '13 at 13:14
  • 3
    *cough cough* shadowing the `input` builtin *cough cough* – Katriel Jan 10 '13 at 13:16
  • Just one piece missing: writing to a new text file... Apart from that, it works.. – mcbetz Jan 10 '13 at 13:20
  • 1
    You have a list, write it yourself ;) – Jakob Bowyer Jan 10 '13 at 13:22
  • Alright, will try to. But `re.compile` throws out an error (2.7): File "/usr/lib/python2.7/re.py", line 190, in compile return _compile(pattern, flags) File "/usr/lib/python2.7/re.py", line 242, in _compile raise error, v # invalid expression – mcbetz Jan 10 '13 at 13:27
  • Your regex isn't valid for python then? – Jakob Bowyer Jan 10 '13 at 14:47
  • Well, the regex from the answer seems wrong. When I use the regex from my question with the script from the answer, I get what I want... – mcbetz Jan 10 '13 at 14:57
0

try this regex (from regular expression cookbook ):

import re
data = open("text_isbn", "r")
regex = "(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}$|[-0-9X ]{13}$|[0-9X]{10}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$"

for l in data.readlines():
    match = re.search(regex, l)
    isbn = match.group()
    outfile.write('%s\n' % isbn)

tested with your sample data. assume that each line contain only one isbn number

MBarsi
  • 2,417
  • 1
  • 18
  • 18
  • Thanks for that answer. It works as well, but I marked the first answer, but yours is good and valid as well... – mcbetz Jan 10 '13 at 16:09