It looks like the fgets() includes the newline in the return results. Can I find some command that does not include newline in the output? Also is there version of fgets() that reads all the tokens on the line and returns pointers to them?
Below code might work, but looks about 10-20% slower than fgets(). Maybe because this read the characters one by one (?)
// Read characters from file and exclude terminating newline
// Input argument 'size' must be integer larger than one
char *fgetsnn(char *s0, int size, FILE *fp) {
size--;
char c;
char *s=s0;
char *se=s0+size;
while (s<se) {
c=fgetc(fp);
switch (c) {
case EOF:
*s='\0';
if (ferror(fp)!=0) {
return(NULL);
};
if (s==s0) {
return(NULL);
} else {
return(s0);
};
case '\n':
*s='\0';
return(s0);
case '\0':
*s='\0';
return(NULL);
default:
*s=c;
s++;
};
};
*s='\0';
return(s0);
};
It looks like using the filebuffer in smaller pieces makes the reading slower. I guess then it would be fastest to read the whole file at once and try to split it in the memory instead (I tried, but it didn't seem help at all). The 'drawback' here is that the most likely condition - the 'default' is last in the switch (and can be only last) and thus all the different condition before default will keep slowing down the speed. Then most likely reason for speed limit is hardware (harddisc) speed.
Below case the most likely character case with normal letters is arranged 1st using if-then rather than case:
// Read characters from file and exclude terminating newline
// Input argument 'size' must be integer larger than one
char *fgetsnn(char *s0, int size, FILE *fp) {
size--;
char c;
char *s=s0;
char *se=s0+size;
while (s<se) {
c=fgetc(fp);
if (c>'\n') {
*s=c;
s++;
} else {
switch (c) {
case '\n':
*s='\0';
return(s0);
case EOF:
*s='\0';
if (ferror(fp)!=0) {
return(NULL);
};
if (s==s0) {
return(NULL);
} else {
return(s0);
};
case '\0':
*s='\0';
return(NULL);
default:
*s=c;
s++;
};
};
};
*s='\0';
return(s0);
};
If the newline ascii code would be '1' it could pass fast all the case (also that are between '1' and '\n')... And maybe ascii code for ' ' (space) should be '2' and for '\t' maybe '3' to make the string separating into tokens faster also...