1

EDIT: OK, I hear you guys, I've isolated the part of my code that's giving me problems, compiled it and made sure that it still gave me the same results, here it goes: Like before, the segfault appears after the first instance of the for loop on strcpy(replace[j]->utf8, strtok(data, "\t")); Thanks again!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>

#define max_chars 45
#define max_UTF 5
#define max_ASCII 7
#define max_word_length 30
#define max_line_length 70
#define max_texto_line 5000

typedef struct {
char utf8[max_UTF];
char ascii_seq[max_ASCII];
int count;
} Replac; 


void getTable(FILE *f, char inputfile[],Replac **replace){
    char data[max_line_length];
    int j;
    f = fopen( inputfile, "r" );
    if (f == NULL) {
        fprintf(stderr, "Can't open input file %s!\n",inputfile);
        exit(1);
    }
    
    fgets(data,sizeof data,f);
    for(j=0 ; strcmp(data,"\n") ; fgets(data,sizeof data,f), j++){  
        if (feof(f)) {                                      
            break;
        }
        strcpy(replace[j]->utf8, strtok(data, "\t"));                   
        strcpy(replace[j]->ascii_seq, strtok(NULL, "\n"));
    }
    fclose(f);
}

int main( int argc, char *argv[] ){
    Replac *replace=malloc(max_chars * sizeof(Replac));
    FILE *fpr,*f,*fpw;
    int carprocess = 0;
    setlocale(LC_ALL,"pt_PT.UTF-8");
    setlocale(LC_COLLATE,"pt_PT.UTF-8");
    
    
    getTable(f,argv[1],&replace);
}

The text file that I'm copying the character from is formated something like this

UTFCHAR \tab asciichar

ex

Á   'A

END EDIT

-#-##-###-####-####+#####+####p

So I'm a beginner using C, and I've tried all I could think of, this seems like a pretty straight forward thing to do, but since I'm having such trouble clearly shows I have some gap in my knowledge...

I wont bother you with the full code since it is working perfectly, it's just that I wanted to do things differently and that's when the trouble started.

In short I'm doing a program that collects a set of chars of UTF8 type, and their ascii replacement, and stores them in a struct such as

 typedef struct {
char utf8[max_UTF];
char ascii_seq[mac_ASCII];
} Replac; 

then in main I did the malloc like this

Replac *replace=malloc(max_chars * sizeof(Replac));

If my thought process is correct, this would create a block of available memory to which *replace is pointing to the starting address.

Then I made a function that scans a few UTF8 chars and their replacement and stores them in the struct, something like

void getTable(FILE *f, char inputfile[],Replac **replace)

now, following the debugger, it seems that I'm creating new variable replace of the type Replace** that's on a completely different address, but inside that address is stored the value to the original malloced struct that I passed through the param.

After that I do a

strcpy(replace[0]->utf8, something I got from the table);

following the debugger and searching through the memory adresses, I see that the first time I do this, the first position of the malloc struct is indeed filled with the right data.

followed by

strcpy(replace[0]->ascii_seq, corresponding ascii sequence to the previous UTF8 char);

and that fills the next memory position in the memory block.

So I get something like while debugging on my variables watch

address replace = (Replac **) 0xbf8104fc that contains 0x0878a008

address *replace = (Replac *) 0x0878a008 that contains the whole struct so inside the address 0x0878a008 I get the data of the utf8 char and then at the address 0x0878a00d I get the ascii seq.

The problem in on the next instance of the loop, when it's time to

strcpy(replace[1]->utf8, something I got from the table);

I get a segmentation fault after that instruction.

So what do you guys think? Am I approaching things correctly, and I'm getting screwed over by syntax or something like that, or is it the base of my knowledge flawed?

Thanks, and a late happy holidays!

Community
  • 1
  • 1
Crisapx
  • 11
  • 3
  • Without true code, this code description is challenging, and IMO, insufficient to determine the segmentation fault. It would be better for all to see the true code. – chux - Reinstate Monica Dec 27 '16 at 23:38
  • Wellcome to SO. Please look into this https://stackoverflow.com/help/mcve to create a minimal example of your code producing the error. – Jens Gustedt Dec 28 '16 at 00:12
  • You can't use `Replac **replace` like that you only have allocate one `Replac`. http://stackoverflow.com/questions/12462615/how-do-i-correctly-set-up-access-and-free-a-multidimensional-array-in-c – Stargateur Dec 28 '16 at 02:26

2 Answers2

0
f = fopen( inputfile, "r" );
...
typedef struct 
{
    char utf8[max_UTF];
    char ascii_seq[max_ASCII];
    int count;
} Replac;
...
fgets(data,sizeof data,f);

You are mixing binary and text format.

Depending on the compiler, sizeof(Replac) will be 16. This includes sizeof(int) which is always 4. There may also be padding if size is not a multiple of 4.

If your data is stored as text, then it will be something like this:

ABCDE\tABCDEFG123456\n

Note that the size of integer in decimal format is anywhere between 0 to 10, so the size is not fixed. And there are (or there should be) new line \n characters.

So you don't want to read exactly 16 characters. You want to write and then read 3 lines for each record. Example:

ABCDE\n
ABCDEFG\n
123456\n

If you are reading in binary, then open the file in binary and use fwrite and fread. Example:

f = fopen( inputfile, "rb" );
Replac data;
fread(f, sizeof(data), 1, f);

This all depends on how your file was created. If you are writing the file yourself, then show the code you used for writing the data.

Also, ASCII is a subset of Unicode. A in ASCII has the exact same representation as A in UTF8.

Barmak Shemirani
  • 30,904
  • 6
  • 40
  • 77
  • I'm not writing anything in the file, the file is simply a utf8char followed by a tab followed by a ascii representation, for example Ç tab ,C or £ tab pound. – Crisapx Dec 28 '16 at 12:40
0
        strcpy(replace[j]->utf8, strtok(data, "\t"));                   

I get a segmentation fault after that instruction.

You just got the dereferencing order wrong. You first subscripted with [j] and then dereferenced with ->, as if we had an array of pointers to Replacs. But we rather have a pointer to (the first element of) an array of Replacs, hence we must dereference the pointer first and subscript thereafter, i. e. instead of

                replace[j]->utf8

we have to write

                (*replace)[j].utf8

or the equivalent

                (*replace+j)->utf8
Armali
  • 18,255
  • 14
  • 57
  • 171