Python file read not grabbing all characters

Question

I'm trying to do a very simple read/write of an image file. The requirement is to strip out a few extra newlines throughout the file. However, I'm having issues literally writing the unmodified file back correctly.

import os, sys
picture = "IMAGE NAME.jp2"

rawImg = open(picture, "rb").read()
newImg = open("IMAGE NAME 2.jp2", "wb")
newImg.write(rawImg)

The above code should read in the first image, then output it to a new file. When attempting this, however, the second file has a corrupted header (according to gimp). When opening the two in Notepad++, it becomes apparent that the last handful of characters have been dropped. How many characters depend on the particular source image, ranging from a small few to a hundred. This is also present with both .jp2 and .png files.

I have tried using the "rU" option, as well as reading line by line with readline() to no avail. As a smaller note, it would be ideal if the newline characters weren't also converted for some reason, since I literally just want to read/write perfectly.

Try using the Image library for reading images. `from PIL import Image` `rawImg = Image.open(picture)` `rawImg.save('IMAGE NAME 2.jp2')` — alec, Mar 06 '20 at 23:49
The problem is that you've used a text interface for picture data. Anything in the pixels that doesn't readily map to the default character set, will be lost or cause an error. Don't tell your program it's getting text data when you want some other handling: use an image or binary access package. — Prune, Mar 07 '20 at 00:20
@RajuKomati without the strip, things are getting dropped. Stripping won't fix that issue. — Lame One, Mar 07 '20 at 05:22
@alec the image needs to be cleaned prior to being written, which requires it to be done bytewise. — Lame One, Mar 07 '20 at 05:23
@Prune my understanding was that by using the "wb" tag, I was telling python to interpret it as a binary file. — Lame One, Mar 07 '20 at 05:23

Python file read not grabbing all characters

0 Answers0