Read text in Python

Question

In the script, for each text file, I check the first two characters. If the first two characters are "[{" which means it is a JSON file, then execute other codes.

However, I have to read the file twice with open(f, 'r', encoding = 'utf-8', errors='ignore' as infile:, which is duplicated. Is there any better way to write this code?

result = []  
                                      
for f in glob.glob("D:/xxxxx/*.txt"):       
    print("file_name: ",f)
    with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile:       
        first_two_char = infile.read(2)
        print(str(first_two_char ))
        if first_two_char  == "[{":
            with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile:       
                json_file = json.load(infile, strict=False)   
                print(len(json_file))
                result.append(json_file)            #here appending the list with Jason content 

print(len(result))

I suppose you could always use [seek](https://python-reference.readthedocs.io/en/latest/docs/file/seek.html) to reset the cursor rather than reopening the file. — Anthony Labarre, Aug 17 '20 at 16:04
Your approach is wrong. Instead of making sure if it's JSON and reading, just **TRY** reading it as JSON and if it doesn't work, do nothing... — Tomerikoo, Aug 17 '20 at 16:10
@Tomerikoo Thanks a lot! Yes, you are right. I have changed my code accordingly. It looks better and works well. Thanks again. — rui jiang, Aug 17 '20 at 16:59
@AnthonyLabarre Thank you! You really answered my question. Next time when I come across with this issue, I will try `seek`. — rui jiang, Aug 17 '20 at 17:01

score 1 · Accepted Answer · answered Aug 17 '20 at 16:10

You could seek(0) to move the file pointer back to zero. Generally, seeking doesn't work with files opened as text because there is an itermediate cache for bytes-to-string decoding. But seek(0) and seek to end of file work.

result = []  
                                      
for f in glob.glob("D:/xxxxx/*.txt"):       
    print("file_name: ",f)
    with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile:       
        first_two_char = infile.read(2)
        print(str(first_two_char ))
        if first_two_char  == "[{":
            infile.seek(0)
            json_file = json.load(infile, strict=False)   
                print(len(json_file))
                result.append(json_file)            #here appending the list with Jason content 

print(len(result))

result = []

But really, just attempting the conversion and catching the error is a better way to go. Suppose the first two characters looked okay only by bad luck?

for f in glob.glob("D:/xxxxx/*.txt"):       
    print("file_name: ",f)
    with open(f, 'r', encoding = 'utf-8', errors='ignore') as infile:
        try:
            result.append(json.load(infile))
        except  json.decoder.JSONDecodeError:
            pass      
print(len(result))

Read text in Python

1 Answers1