convert JPG to txt causes change in file size in python

Question

I have a set of images which are save as .jpg format. I use the following commands on python to load them and store them in a txt file in a comma separated value format.
The original set of images are barely 800 MB in size. However when I save them in the txt file they form a 40 GB txt document.

I was wondering if this makes any sense?

for filename in os.listdir(imagePath):
    if filename!='.DS_Store':
        b= scipy.misc.imread(filename,flatten=0)
        b2=np.reshape(b,np.size(b))
        var = ','.join(['%d' % num for num in b2])
        with open(savepath+'trainMatrix.txt',"a") as f:
            f.write(var+'\n')
            f.close()

Images are saved as binary mode not text. What is the goal, just to save as one archive? — ninMonkey, Oct 20 '13 at 00:13
here's the procedure: each image is imported into python and its RGB matrix is converted to one long vector and saved as a row in the txt file. Therefore if say we have 100 images, we have 100 rows in the txt file each containing the pixel values of the image. — LoveAppleCider, Oct 20 '13 at 00:21

Developer · Accepted Answer · 2013-10-20T01:09:34.877

It seems that there is misunderstanding in what you are going to do with image files. The following shows the two possible cases based on you question.

To read a JPG file into TXT file without analyzing image data i.e., no decompressing etc. Use this (what would be the use for this, we are not sure!, BTW).

import os
from scipy.misc import imread
import numpy as np

imagePath = 'c:/your jpgs/'
savepath = imagePath

#save as text no decompressing
for filename in os.listdir(imagePath):
    if filename!='.DS_Store' and filename[-3:]=='jpg':
        with open(filename,'rb') as fin:
            b = fin.read()
            fin.close()
        out = ','.join(b)+'\n'
        with open(savepath+'trainMatrix1.txt','a') as fut:
            fut.write(out)
            fut.close()

output is as:

ÿ,Ø,ÿ,à, ,,J,F,I,F, ,,,, ,d, ,d, , ,ÿ,á,

To read a JPG file into TXT file with analyzing image data i.e., decompressing etc. Use this which utilises imread to decompress image data. You will need remember JPG is a heavily compressed image format, so after decompressing, it will be huge text file. You are appending all, so the output will be huge!

#save as text decompressed image into bytes
for filename in os.listdir(imagePath):
    if filename!='.DS_Store' and filename[-3:]=='jpg':
        b = imread(filename,flatten=0).flatten()
        print b.shape
        out = ','.join('%d'%i for i in b)+'\n'
        print len(out)
        with open(savepath+'trainMatrix2.txt','a') as fut:
            fut.write(out)
            fut.close()

output is as (color data):

255,255,255,245,245,245,125,125,125,72,72,72,17,17,17,2,2,2,15

Thank you very much. Well, I'm very new to this material but one use of storing it into a txt format is that I can later on use them to train a neural network. The libraries for neural networks usually take in an M*N matrix ( in this case M=number of images, N is the equivalent length of the images when converted to a row vector). — LoveAppleCider, Oct 20 '13 at 00:51
@user2639876 Why don't you store them directly in a MAT file? Then it would be more easy to manipulate them. I would recommend preprocessing the data using unsupervised learning (i.e: RBM, Autoencoders, CNN) — Edgar Andrés Margffoy Tuay, Oct 20 '13 at 01:18
@user2639876 Note that a pixel in an image contains three values for `Red`, `Green` and `Blue` components. It may contain `Alpha` component. So a pixel is not a single value. If you use `gray scale` images with no alpha then a pixel can be represented by a single value. Based on your comment, the second solution helps you in which decompression happens.Check [Color RGB](http://en.wikipedia.org/wiki/RGB_color_model), and [RGB to Gray](http://stackoverflow.com/questions/687261/converting-rgb-to-grayscale-intensity). Happy with answer please accept it. — Developer, Oct 20 '13 at 01:18

convert JPG to txt causes change in file size in python

1 Answers1