There is a file of 1 GB contains a single string of characters. As the string is very large it can't be loaded completely to memory. What is the best way to reverse this string?
Asked
Active
Viewed 858 times
2
-
Please clarify. The title says sort, but the question says reverse. – Raymond Chen Jun 18 '12 at 03:32
-
You need to provide a bit more detail about the structure of the file. Is it text? How are the "words" to be sorted delimited (by line? whitespace? or do you just want the byte values sorted?) – John Carter Jun 18 '12 at 03:33
-
1U smelled it wrong. Not a homework. Found the problem to be interesting, so posted it here. Also, I couldn't think of a solution, so Im looking out for some good answers. – Manik Sidana Jun 18 '12 at 04:07
5 Answers
4
load blocks into memory, iterate through them in reverse while writing them out in order. pseudocode:
load_block(buffer, 4mb, end of file); // Load a 4mb block from the end
for (i = 4mb; i>=0; i--) {
write(buffer[i],1); // Write it out in reverse
}

std''OrgnlDave
- 3,912
- 1
- 25
- 34
-
+1 (Although, it might be more efficient to reverse the block in memory, and then write that block in one go.) – huon Jun 18 '12 at 04:29
-
@dbaupp the C I/O library has a buffer, and then underlying OS has a buffer. you're right, though, reversing in memory is likely faster if only for the lowered amount of function calls. – std''OrgnlDave Jun 18 '12 at 04:47
-
1@std''OrgnlDave: syscalls are rather expensive, so you'd save a heap of time (context switches!) by reversing in memory. – Asherah Jun 18 '12 at 11:48
-
@Len syscalls aren't made untill the C library buffer is filled. that's beside the point, I thought it'd be clearer in memory. I could even show an in-place reversal, but that would be even *less* clear – std''OrgnlDave Jun 18 '12 at 18:21
-
1@std''OrgnlDave: but `write(2)` ***is*** the system call. You need to use the `FILE *` operations `fopen(3)`, `fwrite(3)` and co. if you want buffering from a library. – Asherah Jun 18 '12 at 22:22
1
You can iterate through the file from the end, loading it to memory byte by byte (assuming 8bit characters), saving it to the output file started from the beginning

secretformula
- 6,414
- 3
- 33
- 56
-
The OP did not say that the file contains 'lines'. Could be a binary file. – Eric J. Jun 18 '12 at 03:31
-
-
1"**very big** sequence of characters" implies it is *not* a line to me. Perhaps the OP will weigh in. – Eric J. Jun 18 '12 at 03:33
-
The file contains a SINGLE BIG sentence accomodating 1GB of space. – Manik Sidana Jun 18 '12 at 03:33
1
quick and dirty way of doing it.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(int argc, char ** argv) {
FILE * in,*out;
assert(argc>2);
in = fopen(argv[1],"rb");
out = fopen(argv[2],"wb");
assert(in);
assert(out);
assert(0==fseek(in,0,SEEK_END));
assert(0==fseek(in,-1,SEEK_CUR));
fputc(fgetc(in),out);
while (!fseek(in,-2,SEEK_CUR)) {
fputc(fgetc(in),out);
}
fclose(in);
fclose(out);
return 0;
}
added per comment.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
void flip(char *a, int size) {
int i;char c;
for (i=0;i<size/2;++i) {
c=a[size-1-i];
a[size-1-i] = a[i];
a[i]=c;
}
}
int main(int argc, char ** argv) {
const size_t chunksize = 4096;
char buffer[chunksize];
size_t chunks;
size_t rest;
FILE * in,*out;
size_t t;
assert(argc>2);
in = fopen(argv[1],"rb");
out = fopen(argv[2],"wb");
assert(in);
assert(out);
assert(0==fseek(in,0,SEEK_END));
t = ftell(in);
assert(t>0);
chunks = t/chunksize;
rest = t%chunksize;
assert(0==fseek(in,-rest,SEEK_CUR));
assert(rest == fread(buffer, 1, rest, in));
flip(buffer, rest);
assert(rest == fwrite(buffer,1,rest,out));
while (!fseek(in,-(chunksize+rest),SEEK_CUR)) {
rest = chunksize;
assert(rest == fread(buffer, 1, rest, in));
flip(buffer, rest);
assert(rest == fwrite(buffer,1,rest,out));
}
fclose(in);
fclose(out);
return 0;
}

pizza
- 7,296
- 1
- 25
- 22
0
I think, the best option for your problem is to use mmap to load the file in to memory. See mmap man page to load the file or read the following link When should I use mmap for file access?
-
1I think the link contradicts with what you want to say. Read the below excerpt from the link: One place mmap can be awkward is if you need to work with very large files on a 32 bit machine. This is because mmap has to find a contiguous block of addresses in your process's address space that is large enough to fit the entire range of the file being mapped. This can become a problem if your address space becomes fragmented, where you might have 2 GB of address space free, but no individual range of it can fit a 1 GB file mapping. – Manik Sidana Jun 18 '12 at 06:00
0
Divide the string into sub-strings that your memory can handle.Read those sub-strings from the end. Reverse each sub-string and the print the result to output file.

mohit
- 5,696
- 3
- 24
- 37