0

hi i have encountered a problem while trying to implement bubble sort in my work on a file with many records in C.

I have to ONLY modify my file and not use any temp files doing this. whenever i try to swap two lines in my file i manage to write the second line instead of the first one successfully, but when i try to write the first line instead of the second one it seems to override lines after the second one. i know that it is because the length of the lines are not identical, but is there a way maybe by padding the lines with empty chars or something to do this with success?

here is the code: (can't seem to upload the variable declaration for some reason...) the relevant part is in the if condition

for(j=0; j<linesNum-1; j++)
{
    for(k=0; k<linesNum-j-1; k++)
    {
        getline(&line1, &len, fp);
        pos = ftell(fp);
        getline(&line2, &len, fp);

        int i;
        char *tmp1 = (char*)malloc(sizeof(char)*strlen(line1));
        char *tmp2 = (char*)malloc(sizeof(char)*strlen(line2));
        strcpy(tmp1, line1);
        strcpy(tmp2, line2);

        token1 = strtok(tmp1, del);
        for(i=0; i<field; i++)
        {
            token1 = strtok(NULL, del);
        }
        token2 = strtok(tmp2, del);
        for(i=0; i<field; i++)
        {
            token2 = strtok(NULL, del);
        }

        if(strcmp(token1, token2) > 0)
        {
            fseek(fp, -(strlen(line1)+strlen(line2)), SEEK_CUR);
            fputs(line2, fp);
            //fsetpos(fp, &pos);
            fputs(line1, fp);

        }
        fsetpos(fp, &pos);
        free(tmp1);
        free(tmp2);
    }
}

}

from this file:

GNY1MJS2X,Mulan,Tony Bancroft,88,Harvey Fierstein,USA,7.5,Adventure
A1F9ENILS,Tropic Thunder,Ben Stiller,121,Steve Coogan,USA,7,Action
3YRMEJM1Y,The Departed,Martin Scorsese,151,Matt Damon,USA,8.5,Crime
NQMNMMOMR,The Girl with the Dragon Tattoo,David Fincher,158,Goran Visnjic,USA,7.8,Crime
FT2ULUNQ5,Die Hard with a Vengeance,John McTiernan,128,Aldis Hodge,USA,7.6,Action

I seem to get this:

A1F9ENILS,Tropic Thunder,Ben Stiller,121,Steve Coogan,USA,7,Action

GNY1MJS2X,Mulan,Tony Bancroft,88,Harvey Fierstein,USA,7.5,Adventure
YRMEJM1Y,The Departed,Martin Scorsese,151,Matt Damon,USA,8.5,Crime
NQMNMMOMR,The Girl with the Dragon Tattoo,David Fincher,158,Goran Visnjic,USA,7.8,Crime
FT2ULUNQ5,Die Hard with a Vengeance,John McTiernan,128,Aldis Hodge,USA,7.6,Action

which has an empty line between the first and the second line... and it also ruin the next lines.

Also, I sorted by one field delimited by comma, say the first field which is the movie ID.

sorry for the messy way to write the files

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
Eyal Arviv
  • 25
  • 1
  • 4
  • 3
    Where is the code you've got so far? We can't see what the problem is unless you include a [mcve] – Chris Turner Mar 21 '17 at 15:11
  • 1
    Please show your code, a sample of the output and what you expect the output to be. – JeremyP Mar 21 '17 at 15:11
  • You never mentioned the word "adjacent" in your description of the two lines you're swapping. Any data *between* the lines will require re-positioning if the lines are different lengths. If the lines are the same lengths, OR they are adjacent to one another, it becomes a trivial case. Anyway, can't help you without your code to identify where you things went south, so best of luck. – WhozCraig Mar 21 '17 at 15:17
  • Please show what you are getting – bichito Mar 21 '17 at 15:34
  • i added the code and yes i meant two adjacent rows every time. what do you mean by a trivial case? – Eyal Arviv Mar 21 '17 at 15:35
  • Please post the **entire** source file as per [mcve]. See [editing help](http://stackoverflow.com/editing-help). – n. m. could be an AI Mar 21 '17 at 15:53
  • There is NO need to cast the return of `malloc`, it is unnecessary. See: [**Do I cast the result of malloc?**](http://stackoverflow.com/q/605845/995714) for thorough explanation. `char *tmp1 = malloc (sizeof *tmp1 * strlen(line1));` is fine. You may want to read `man 3 fputs` to find out where the additional `'\n'` is coming from. – David C. Rankin Mar 21 '17 at 17:05

2 Answers2

1

The reason you are seeing corrupted output is that you are setting the position back to where line1 ended, even if the lines are swapped. This is a problem if line1 and line2 differ in length.

To minimize refactoring, you can seek backward the length of line1 on a swap and the length of line2 if you are not swapping.

Here is the update to your if statement, with len1 and len2 being the lengths of line1 and line2 respectively.

if(strcmp(token1, token2) > 0)
{        
    fseek(fp, -(len1 + len2), SEEK_CUR);
    fputs(line2, fp);
    fputs(line1, fp);
    fseek(fp, -len1, SEEK_CUR);
}
else
{
    fseek(fp, -len2, SEEK_CUR);
}

I also think you need to rewind in your outer loop if you want to go back through the file.

Marc Chiesa
  • 396
  • 3
  • 5
0

First, consider your algorithm.

You're always comparing line N with line N-1. After reading line 1, you always compare the current line (starting with line 2) to the previous one. Keep track of the offset of line N-1 in the file, because when you swap, you'll write line N to the N-1 position. You don't need to keep track of the current position, because after you seek and write the lines in swapped order, you'll be at the start of the next line.

So, you're algorithm becomes something like:

do 
    pos = 0
    read prev
    while read curr
        if curr < prev
            seek to pos
            write curr
            write prev
            copy prev to curr (for the next comparison)
            nswapped++
        pos = ftell
    rewind
until nswapped == 0  

I didn't try to debug your program because whatever is wrong is partly a function of complexity you've introduced. If you minimize the work you're doing, any errors will become more obvious.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31