-3

For the class cs50, I have to read in jpeg files byte by byte from a memory card in order to look at the header information. The file compiles well, but whenever I execute the file, it returns a "segmentation fault(core dumped)" message.

Edit) Okay, now I know why I have to use an "unsigned char" instead of "int*". Can someone tell me how I can store information into files within scope for this particular code? Right now, I am trying to store information outside of an if() condition, and I don't think the fread function is actually accessing the "image" file I opened.

#include <stdio.h>
#include <string.h>
#include <math.h>

FILE * image = NULL;

int main(int argc, char* argv[])
{
    FILE* infile = fopen("card.raw", "r");
    if (infile == NULL)
    {
        printf("Could not open.\n");
        fclose(infile);
        return 1;
    }
    unsigned char storage[512]; 
    int number = 0;
    int b = floor((number) / 100);
    int c = floor(((number) - (b * 100))/ 10);
    int d = floor(((number) - (b * 100) - (c * 10)));
    int writing = 0;
    char string[5];
    char* extension = ".jpg";

    while (fread(&storage, sizeof(storage), 1, infile))
    {

        if (storage == NULL)
        {
            break;
        }

        if (storage[0] == 0xff && storage[1] == 0xd8 && storage[2] == 0xff)
        {
            if (storage[3] == 0xe0 || storage[3] == 0xe1)
            {

                if (image != NULL)
                {
                    fclose(image);
                }
                sprintf(string, "%d%d%d%s", b, c, d, extension);
                image = fopen(string, "w");
                number++;
                writing = 1;
                if (writing == 1 && storage != NULL)
                {
                    fwrite(storage, sizeof(storage), 1, image);
                }
            }
        } 

        if (writing == 1 && storage != NULL)
        {
            fwrite(storage, sizeof(storage), 1, image);
        }

        if (storage == NULL)
        {
            fclose(image);
        }

    }

    fclose(image);
    fclose(infile);
    
    return 0;
}

This is the problem set just in case my explanation is not clear.

recover

In anticipation of this problem set, I spent the past several days snapping photos of people I know, all of which were saved by my digital camera as JPEGs on a 1GB CompactFlash (CF) card. (It’s possible I actually spent the past several days on Facebook instead.) Unfortunately, I’m not very good with computers, and I somehow deleted them all! Thankfully, in the computer world, "deleted" tends not to mean "deleted" so much as "forgotten." My computer insists that the CF card is now blank, but I’m pretty sure it’s lying to me. Write in ~/Dropbox/pset4/jpg/recover.c a program that recovers these photos.

Ummm. Okay, here’s the thing. Even though JPEGs are more complicated than BMPs, JPEGs have "signatures," patterns of bytes that distinguish them from other file formats. In fact, most JPEGs begin with one of two sequences of bytes. Specifically, the first four bytes of most JPEGs are either 0xff 0xd8 0xff 0xe0 or 0xff 0xd8 0xff 0xe1 from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos (e.g., my CF card), they demark the start of a JPEG. (To be sure, you might encounter these patterns on some disk purely by chance, so data recovery isn’t an exact science.)

Fortunately, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately after the previously taken photo. Accordingly, the start of a JPEG usually demarks the end of another. However, digital cameras generally initialize CF cards with a FAT file system whose "block size" is 512 bytes (B). The implication is that these cameras only write to those cards in units of 512 B. A photo that’s 1 MB (i.e., 1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 "blocks" on a CF card. But so does a photo that’s, say, one byte smaller (i.e., 1,048,575 B)! The wasted space on disk is called "slack space." Forensic investigators often look at slack space for remnants of suspicious data.

The implication of all these details is that you, the investigator, can probably write a program that iterates over a copy of my CF card, looking for JPEGs' signatures. Each time you find a signature, you can open a new file for writing and start filling that file with bytes from my CF card, closing that file only once you encounter another signature. Moreover, rather than read my CF card’s bytes one at a time, you can read 512 of them at a time into a buffer for efficiency’s sake. Thanks to FAT, you can trust that JPEGs' signatures will be "block-aligned." That is, you need only look for those signatures in a block’s first four bytes.

Realize, of course, that JPEGs can span contiguous blocks. Otherwise, no JPEG could be larger than 512 B. But the last byte of a JPEG might not fall at the very end of a block. Recall the possibility of slack space. But not to worry. Because this CF card was brand- new when I started snapping photos, odds are it’d been "zeroed" (i.e., filled with 0s) by the manufacturer, in which case any slack space will be filled with 0s. It’s okay if those trailing 0s end up in the JPEGs you recover; they should still be viewable. Now, I only have one CF card, but there are a whole lot of you! And so I’ve gone ahead and created a "forensic image" of the card, storing its contents, byte after byte, in a file called card.raw . So that you don’t waste time iterating over millions of 0s unnecessarily, I’ve only imaged the first few megabytes of the CF card. But you should ultimately find that the image contains 16 JPEGs. As usual, you can open the file programmatically with fopen , as in the below. FILE* file = fopen("card.raw", "r"); Notice, incidentally, that ~/Dropbox/pset4/jpg contains only recover.c, but it’s devoid of any code. (We leave it to you to decide how to implement and compile recover!) For simplicity, you should hard-code "card.raw" in your program; your program need not accept any command-line arguments. When executed, though, your program should recover every one of the JPEGs from card.raw, storing each as a separate file in your current working directory. Your program should number the files it outputs by naming each , ###.jpg where ### is three-digit decimal number from 000 on up. (Befriend sprintf.) You need not try to recover the JPEGs' original names. To
check whether the JPEGs your program spit out are correct, simply double-click and take a look! If each photo appears intact, your operation was likely a success! Odds are, though, the JPEGs that the first draft of your code spits out won’t be correct. (If you open them up and don’t see anything, they’re probably not correct!) Execute the command below to delete all JPEGs in your current working directory. rm *.jpg If you’d rather not be prompted to confirm each deletion, execute the command below instead. rm -f *.jpg Just be careful with that -f switch, as it "forces" deletion without prompting you.

Community
  • 1
  • 1
Jessup Jong
  • 31
  • 1
  • 1
  • 6
  • You have very basic errors in your program. Going through a tutorial on basic IO in C might be helpful. – R Sahu Jul 06 '15 at 04:42
  • Try to understand [`FILE* infile = fopen("card.raw", "r");`](http://linux.die.net/man/3/fopen). What does this do and what are the meanings of the arguments. – Mohit Jain Jul 06 '15 at 04:43
  • FILE* infile is a pointer to the file called "card.raw" and fopen is the function to read the information from "card.raw," right? – Jessup Jong Jul 06 '15 at 04:48
  • Add to the list of things that are near-certainly wrong, `int* storage[512]` declares 512 pointers to `int`, not a pointer to 512 `int`. – WhozCraig Jul 06 '15 at 04:55
  • Possible duplicate of [Read a file as byte array](http://stackoverflow.com/questions/22059189/c-reading-any-file-as-byte-array). It addresses your problem at `int* storage[512]`. Also see [Reading a binary file 1 byte at a time](http://stackoverflow.com/q/6093224). – jww Jul 06 '15 at 05:07
  • Yeah, I actually read that thread, and I now understand that I have to use a "unsigned char", but I still have to know how to access files out of scope. Also, it still returns segmentation fault. – Jessup Jong Jul 06 '15 at 05:17
  • @JessupJong edit your code to include the changes you have made relating to the unsigned char issue. – M.M Jul 07 '15 at 05:13
  • [Debug your code](http://ericlippert.com/2014/03/05/how-to-debug-small-programs/) to find where a segmentation fault might come from. This code is supicious: `int* number = 0; int b = floor((*number) / 100);`. You form a null pointer and then dereference that pointer, which causes undefined behaviour (and might give a segmentation fault). You don't seem to go on to use number,b,c,d later so you could just delete that whole bit. – M.M Jul 07 '15 at 05:14
  • `sprintf(string,` also causes UB because `string` does not point to valid memory – M.M Jul 07 '15 at 05:15
  • when the fopen() fails, do not call fclose() as the file is not open, so it cannot be closed. However, rather than call fclose(), call perror() (read the man page for details.) Suggest: read the man pages for fopen, fclose, floor, then read about pointers and arrays – user3629249 Jul 07 '15 at 13:46
  • Note, depending on the file system, the method of 'delete'ing a file varies. For DOS and similar, the method is to set the first character of the file name in the directory structure to 0x00. To access that file, a program could read the entries in the directory, when finding a file name that begins with 0x00, change that char to 'D' However, other structures use other methods. I.E. learn the details before performing any writes. Note: do not add any new pictures as they can/will overwrite portions of exiting/deleted .jpg files – user3629249 Jul 07 '15 at 13:56
  • why do you use `floor`? `(number) / 100` is exactly the same as `floor((number) / 100)` except that it doesn't involve conversions to and from `double` and a double truncation, so it's much faster – phuclv Aug 29 '16 at 03:11
  • You don't need to write your own. If you *actually* had this problem, use https://www.cgsecurity.org/wiki/PhotoRec or http://foremost.sourceforge.net/ on the block device to scan for magic numbers of known file types metadata types. – Peter Cordes Jun 05 '19 at 10:04

2 Answers2

3
int* storage[512];

You define a pointer to a memory location for 512 ints, but you don't actually reserve the space (only the pointer.

I suspect you just want

int storage[512];

After this, storage is still a pointer, but now it actually points to 512 ints. Though I still think you don't want this. You need 'bytes' not ints. The nearest C has are unsigned char. So the final declaration is:

unsigned char storage[512];

Why? Because read reads into consecutive bytes. If you read into ints, then you will read 4 bytes into each int (because an int occupies 4 bytes).

jcoppens
  • 5,306
  • 6
  • 27
  • 47
  • I see, that helps so much. Thanks! Do you also know how I can fwrite to files that were open in previous if() conditions? I have to figure out the scope. – Jessup Jong Jul 06 '15 at 05:01
  • Exactly the same way. As long as you didn't modify 'storage'. Don't forget to mark the answer as such (click the 'V' left) and/or as useful (up-arrow) if it was in fact useful... – jcoppens Jul 06 '15 at 05:07
  • I have, but it won't publicly display until I have 15 reputations. Thanks jcoppens. So you're saying that I can still access the file from another if() condition even when out of scope? – Jessup Jong Jul 06 '15 at 05:12
  • @JessupJong The scope of the variables in C is determined where the declaration is made. As storage is declared at the 'top level' of `main`, you can access it anywhere after the declaration (well, normally, at least). – jcoppens Jul 06 '15 at 05:41
  • I see! That is perfect. @Rafaf Tahsin, I would choose it as an answer, but the code still returns segmentation fault(core dumped) even after I changed it to a unsigned char and took away all the * pointer symbols from storage. – Jessup Jong Jul 06 '15 at 05:49
  • If any answer is partially correct, it is encouraged to accept the answer. If you get far better answer, you may change your opinion. The effort someone take to answer a question should be appreciated when it helps, whether partially or fully. Thanks :D – Rafaf Tahsin Jul 06 '15 at 06:13
  • `int* storage[512];` means that memory for 512 pointers is reserved. Your first paragraph would be a correct description for `int (*storage)[512];`, which is different. – M.M Jul 07 '15 at 05:11
  • After `int storage[512]`, storage is an *array*, not a pointer. It consists of 512 ints. – M.M Jul 07 '15 at 05:12
0

There are a number of problems in your program. The first is that you have not opened the file in binary mode.

The second is that you are doing unnecessary pointer arithmetic. Why not—

 char buffer [BUFFERSIZE] ;

 ....

 if (buffer [ii] == WHATEVER)
user3344003
  • 20,574
  • 3
  • 26
  • 62