0

I need your help. I want to read a text file "as a whole" and not line by line. This is because by doing line by line my regex doesn't work well, it needs the whole text. So far this is what I am being doing:

with open(r"AllText.txt") as fp:
    for line in fp:
        for i in re.finditer(regexp_v3, line):
            print i.group()

I need to open my file, read it all, search if for my regex and print my results. How can I accomplish this?

Bach
  • 6,145
  • 7
  • 36
  • 61
Con7e
  • 225
  • 4
  • 20

3 Answers3

5

To get all the content of a file, just use file.read():

all_text = fp.read()  # Within your with statement.

all_text is now a single string containing the data in the file.


Note that this will contain newline characters, but if you are extracting things with a regex they shouldn't be a problem.

anon582847382
  • 19,907
  • 5
  • 54
  • 57
  • Nice answer, but sshashank124 also provided a solution for newlines :) – Con7e Apr 17 '14 at 12:04
  • 1
    @Con7e Be careful with getting rid of newlines. It will make all of the lines indistinguishable and make it harder to work with. I would just leave them in. – anon582847382 Apr 17 '14 at 12:05
  • @Con7e Also if my answer was helpful to you, consider accepting it by clicking the tick below the vote counts so that it turns green and stays green. Thank you. – anon582847382 Apr 17 '14 at 12:13
3

For that use read:

with open("AllText.txt") as fp:
    whole_file_text = fp.read()

Note however, that your test will contain \n where the new-line used to be in your text.

For example, if this was your text file:

#AllText.txt
Hello
How
Are
You

Your whole_file_text string will be as follows:

>>> whole_file_text
'Hello\nHow\nAre\nYou'

You can do either of the following:

>>> whole_file_text.replace('\n', ' ')
'Hello How Are You'

>>> whole_file_text.replace('\n', '')
'HelloHowAreYou'
sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • You are my hero. You answered even the question that I was gonna ask (regarding new lines). Thank you so much! I'll promote your answer once it passes enough time (4 minutes according to the website) – Con7e Apr 17 '14 at 12:02
  • Sorry, forgot it. Done now. :) – Con7e Apr 17 '14 at 13:08
3

If you don't want to read the entire file into memery, you can use mmap

Memory-mapped file objects behave like both strings and like file objects.

import re, mmap

with open(r'AllText.txt', 'r+') as f:
    data = mmap.mmap(f.fileno(), 0)
    mo = re.finditer(regexp_v3, data)
atupal
  • 16,404
  • 5
  • 31
  • 42
  • +1 I love it. Does it work on Windows as well? – emesday Apr 17 '14 at 12:14
  • @mskimm Yeah, But note you cannot create an empty mapping on Windows, see more on [mmap](https://docs.python.org/2.7/library/mmap.html) :-) – atupal Apr 17 '14 at 12:21
  • This seems unnecessary, [this](http://stackoverflow.com/a/23132390/432913) answer is much simpler. – will Apr 17 '14 at 13:12