The only way to increase your "speed algorithm" without touching it is to manually bufferise printf.
printf is a function that will do a system call at some time.
Every system call is costly, that's why each function that do some kind of system call usually do "buffering".
For malloc, in reallity, you allocate much more that you think (malloc(1) will not end up by allocating 1 octet but much more under the hood) and when you free the memory, in reallity it's not really released (that way, if you do another malloc, you will not do a system call but reuse the memory freed). Of course, it's OS dependant AND inmplementation dependend (all is under the hood).
You can see some system call under linux by using "strace".
The same thing apply to "printf" : since it will do a system call, there a buffer that retain what you want to print and do the print only time to time.
So when the printf's buffer is really printed ?
We can't know, it's implementation defined (event the man page of printf doesn't say a word about the printf buffering), but usually, it can be at 3 occasion :
- when the buffer is full
- when you force the flush by using fflush
- when you have the '\n' caracter in what you want to print.
Since you do an "\n" at each subnet, printf may have to do the system call every time : it's time consumming.
By using a buffer and print in the buffer instead of stdout, you can speed up your code :
#define BUF_LEN 2048
typedef struct mybuf {
char buffer[BUF_LEN];
size_t len;
} mybuf;
// For convenience, I use global varaible, but it's BAD
mybuf buf = {.buffer = "", .len = 0};
void MyBuf_PrintOnStdout(void)
{
write(1, buf.buffer, buf.len);
buf.len = 0;
}
void MyBuf_Flush(void)
{
MyBuf_PrintOnStdout();
fflush(stdout);
}
void MyBuf_PrintInteger(int integer)
{
int printedLen;
// An 64bit integer take 20digit + 1 char for potential "-"
// So 21 is the max len for an integer.
// Of course, if your int is bigger than 64bit, this if is false.
if (buf.len + 21 >= BUF_LEN) {
MyBuf_PrintOnStdout();
}
printedLen = sprintf(buf.buffer + buf.len, "%d", integer);
// Error check (printedLen < 0 == error)
buf.len += printedLen;
}
void MyBuf_PrintCharacter(char character)
{
if (buf.len + 1 >= BUF_LEN) {
MyBuf_PrintOnStdout();
}
buf.buffer[buf.len] = character;
++buf.len;
}
void subset(int n, int k, int arr[], int pos[], int index, int start)
{
if (index == k) {
for (int j = 0; j < k; ++j) {
MyBuf_PrintInteger(pos[j]);
MyBuf_PrintCharacter(j == k-1 ? '\n' : ' ');
}
return;
}
for(int i = start; i<n; i++){
pos[index] = arr[i];
subset(n, k, arr, pos, index+1, i+1);
}
}
Don't forget to call "MyBuf_Flush" at the end, because without it you will probably missing some printing.
Edit : With the complet code, I do some testing.
While it's true there is improvement (N = 30, k = 20 with your code take ~88s and with the write take ~78s), it really too poor to make that work on less than 1s.
Is it possible to resolve your problem without having a supercalculator ?
Another edit : Okay, I have confused the meaning of "should" and "must", sorry. English is not my motherlanguage. (I thinked that you must use recursion).
Since you are free to not use recursion, here something interesting :
I've implemented recursion and not recursion of n=30, k=20.
For each implementation, I have enable and disabled the printing.
The result are clear :
- recursion, printing with printf : ~88s
- recursion, printing buffered : ~78s
- recursion, no printing : ~7s
--
- no recursion, printing with printf : ~80s
- no recursion, printing buffered : ~70s
- no recursion, no printing : ~0,47s
So in conclusion, it's more the printing part that is really taking time than finding the solution itself.
Here the no recursive implementation :
#define BUF_LEN 4096
typedef struct mybuf {
char buffer[BUF_LEN];
size_t len;
} mybuf;
// For convenience, I use global varaible, but it's BAD
mybuf buf = {.buffer = "", .len = 0};
void MyBuf_PrintOnStdout(void)
{
/*buf.buffer[buf.len] = '\0';
printf("%s", buf.buffer);*/
write(1, buf.buffer, buf.len);
buf.len = 0;
}
void MyBuf_Flush(void)
{
MyBuf_PrintOnStdout();
fflush(stdout);
}
void MyBuf_PrintInteger(int integer)
{
int printedLen;
if (buf.len + 21 >= BUF_LEN) {
MyBuf_PrintOnStdout();
}
printedLen = sprintf(buf.buffer + buf.len, "%d", integer);
// Error check (printedLen < 0 == error)
buf.len += printedLen;
}
void MyBuf_PrintCharacter(char character)
{
if (buf.len + 1 >= BUF_LEN) {
MyBuf_PrintOnStdout();
}
buf.buffer[buf.len] = character;
++buf.len;
}
void subset_no_recursion(int n, int k)
{
int pos[k];
for (int i = 0; i < k; ++i) {
pos[i] = k - i;
}
for (;;) {
// Last digit incrementation
while (pos[0] <= n) {
/* print
for (int i = k - 1; i >= 0; --i) {
MyBuf_PrintInteger(pos[i]);
MyBuf_PrintCharacter(i == 0 ? '\n' : ' ');
}*/
++pos[0];
}
// We find where we can increment the digit without overflow N
int pivot = 1;
while (pos[pivot] == n - pivot) {
++pivot;
}
if (pivot == k) {
return;
}
++pos[pivot];
while (pivot) {
pos[pivot - 1] = pos[pivot] + 1;
--pivot;
}
}
}
void subset_recursion(int n, int k, int pos[], int index, int start)
{
if (index == k) {
for (int i = 0; i < k; ++i) {
MyBuf_PrintInteger(pos[i]);
MyBuf_PrintCharacter(i == k-1 ? '\n' : ' ');
}
return;
}
for (int i = start; i < n; i++) {
pos[index] = i + 1;
subset_recursion(n, k, pos, index + 1, i + 1);
}
}
#define N 30
#define K 20
int main(void)
{
int pos[K];
time_t clockstart;
time_t clockend;
clockstart = clock();
subset3(N, K);
clockend = clock();
printf("%f\n", ((float)(clockend - clockstart)) / CLOCKS_PER_SEC);
return 0;
}