2

I am trying to use the MNIST data set of handwritten digits for a project, and I am trying to read each picture in as 28 by 28 2D array of ints from 1-255, corresponding to the greyscale color of each pixel. I downloaded the training file (train-images-idx3-ubyte.gz) off of their website (http://yann.lecun.com/exdb/mnist/), and am having trouble actually processing this file. It describes the file format as 16 bytes of header info, followed by unsigned bites, each of which holding one pixel, organized row-wise. See the website for more details.

In my code I try and read the file into a byte array (which, when I run it, is the same size as the file specified: 9,912,422 bytes). I then start at the seventeenth byte so as to skip the header, and compensate for the fact that java tries to make the byte a signed integer adding 128 to the absolut value of all the negative negative numbers (their first bit was a one). To see if this was working I tried to print it using a drawing panel class which I know works, and I only see static, there is no pattern at all to the pixels. What am I doing wrong with handling the file? Thanks!

 File file=new File("train-images-idx3-ubyte.gz");
 long size = file.length(); 
 System.out.println(size);        
 byte[] contents=new byte[(int)size];
 FileInputStream in = new FileInputStream(file);
 in.read(contents);
 in.close();
 DrawingPanel panel = new DrawingPanel(400, 400);
 Graphics g = panel.getGraphics(); 
 int xloc = 0;
 int yloc = 0;                         
 for(int jj = 0; jj < 28; jj++)
 {
    for(int ii = 0; ii < 28; ii++)
    {
       int x = (int) contents[17+jj*28+ii];
       if(x < 0)
       {
          x = (x * (0-1)) + 128;
       }
       System.out.print(x + " ");
       int color = (255 - x);
       g.setColor(new Color(x,x,x));
       g.fillRect(xloc,yloc,10,10);
           xloc += 10;
    }
    System.out.println();
    yloc+= 10;
    xloc = 0;
 }
Justin Sanders
  • 313
  • 2
  • 12
  • You might need a special library to read a GZIP file correctly, q.v. [here](https://stackoverflow.com/questions/35789253/how-to-read-from-gzipinputstream) for a start. – Tim Biegeleisen Jul 27 '17 at 01:57
  • I think you are meant to un-gzip the file first, then read in the uncompressed file. – Greg Kopff Jul 27 '17 at 04:42

2 Answers2

0

For anyone coming across this question in the future, the comments were right, you do have to unzip the gz file first, however, I looked into this and it looked really complicated.

While I was looking into that though, I discovered that a csv of the data is readily available online through a quick google search, so unless you like extracting files yourself I would recommend using this!

Justin Sanders
  • 313
  • 2
  • 12
  • To unzip them on Windows, you can use third-party software like [7-Zip](https://www.7-zip.org/). (Otherwise, `tar -xvzf [file-path]` in the command line/terminal should work.) – h4nek Jul 19 '19 at 10:24
0

Once you have unzipped the data your code runs nicely on my site, but only after these changes

  • if(x<0) x+=128; // fix signed int

  • if(x>255) x=255; //cap any high value

  • int color = (255 - x);

  • g.setColor(new Color(color,color,color)); // instead of x,x,x

George
  • 1