How to use Python to find all isbn in a text file?

Question

I have a text file text_isbn with loads of ISBN in it. I want to write a script to parse it and write it to a new text file with each ISBN number in a new line.

Thus far I could write the regular expression for finding the ISBN, but could not process any further:

import re
list = open("text_isbn", "r")
regex = re.compile('(?:[0-9]{3}-)?[0-9]{1,5}-[0-9]{1,7}-[0-9]{1,6}-[0-9]')

I tried to use the following but got an error (I guess the list is not in proper format...)

parsed = regex.findall(list)

How to do the parsing and write it to a new file (output.txt)?

Here is a sample of the text in text_isbn

Praxisguide Wissensmanagement - 978-3-540-46225-5
Programmiersprachen - 978-3-8274-2851-6
Effizient im Studium - 978-3-8348-8108-3

Post a snippet of `text_isbn` file in the question body and your regular expression as well. — Ashwini Chaudhary, Jan 10 '13 at 13:13
You're applying `regex.findall` on an open file handle, whereas it's expecting a string. Try calling `open(...).read()` first. — Tim, Jan 10 '13 at 13:16

Jakob Bowyer · Accepted Answer · 2013-01-10T15:29:11.360

8

How about

import re

isbn = re.compile("(?:[0-9]{3}-)?[0-9]{1,5}-[0-9]{1,7}-[0-9]{1,6}-[0-9]")

matches = []

with open("text_isbn") as isbn_lines:
    for line in isbn_lines:
        matches.extend(isbn.findall(line))

edited Jan 10 '13 at 15:29

answered Jan 10 '13 at 13:13

Jakob Bowyer

33,878
8
76
91

1

With the regex expression taken from here: http://stackoverflow.com/questions/4381514/regular-expression-for-an-isbn-13 – Tim Jan 10 '13 at 13:14
3

*cough cough* shadowing the `input` builtin *cough cough* – Katriel Jan 10 '13 at 13:16
Just one piece missing: writing to a new text file... Apart from that, it works.. – mcbetz Jan 10 '13 at 13:20
1

You have a list, write it yourself ;) – Jakob Bowyer Jan 10 '13 at 13:22
Alright, will try to. But `re.compile` throws out an error (2.7): File "/usr/lib/python2.7/re.py", line 190, in compile return _compile(pattern, flags) File "/usr/lib/python2.7/re.py", line 242, in _compile raise error, v # invalid expression – mcbetz Jan 10 '13 at 13:27
Your regex isn't valid for python then? – Jakob Bowyer Jan 10 '13 at 14:47
Well, the regex from the answer seems wrong. When I use the regex from my question with the script from the answer, I get what I want... – mcbetz Jan 10 '13 at 14:57

score 0 · Answer 2 · answered Jan 10 '13 at 14:56

0

try this regex (from regular expression cookbook ):

import re
data = open("text_isbn", "r")
regex = "(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}$|[-0-9X ]{13}$|[0-9X]{10}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$"

for l in data.readlines():
    match = re.search(regex, l)
    isbn = match.group()
    outfile.write('%s\n' % isbn)

tested with your sample data. assume that each line contain only one isbn number

answered Jan 10 '13 at 14:56

MBarsi

2,417
1
18
18

Thanks for that answer. It works as well, but I marked the first answer, but yours is good and valid as well... – mcbetz Jan 10 '13 at 16:09

How to use Python to find all isbn in a text file?

2 Answers2

Linked