I have a 2 dimensional array with fixed number of columns=4, but the number of rows is very large. I found out using vector<tuple>
instead of vector<array>
or was nearly 2x faster, and 5x faster than vector<vector>
. Though using a tuple
is quite a headache. I defined a access function to get around it:
typedef tuple<int,int,int,int> Tuple;
// access the i-th element
inline int &tget (Tuple &t, int i) {
switch (i) {
case 0: return std::get<0>(t);
case 1: return std::get<1>(t);
case 2: return std::get<2>(t);
case 3: return std::get<3>(t);
}
}
vector<Tuple> array;
// this gives the element in i-th row and j-th column
tget(array[i],j);
I found this quite surprising and was wondering where this performance boost coming from, and if there is a neat solution with a syntax similar to vector<vector>
or vector<array>
using subscript operator[]
with the same performance as tuple
?
UPDATE:
I'm using gcc4.8
with c++11
standards and I use -O3
for optimization.
Because people asked about the operations The operations are quite basic. First the 2d array is filled up with up to 1M rows. First I reserve the space and then use push_back
function to add elements. Then I sort the array, and finally I do a binary search on it.
This is the code block I wrote. The input will be several test cases (=T) and each test case takes an integer (N up to 20), and then reads N more integers in lens
. Then because of the exponential nature of the function the sums get quite big (4^(N/2) up to roughly a million). To test between array and tuple, toggle the typedef and comment/uncomment the return line in the access function. There are four lines/blocks that have to be commented/uncommented to switch back and forth between array
and tuple
, and the lines are designated in the comments (// if array
// if tuple
comments) :
#include <iostream>
#include <algorithm>
#include <vector>
#include <tuple>
#include <array>
using namespace std;
typedef vector<int>::iterator VecIt;
// HERE you can switch between the two types
typedef tuple<int,int,int,int> Tuple; // define it as a tuple
//typedef array<int,4> Tuple; // define it as array
inline int &tget (Tuple &t, int i) {
// return t[i]; // if it's an array, uncomment This too
// if it's an array comment the switch case
switch (i) {
case 0: return std::get<0>(t);
case 1: return std::get<1>(t);
case 2: return std::get<2>(t);
case 3: return std::get<3>(t);
default:
cerr << "tuple index out of bound" << endl;
}
}
void comp_sums(VecIt v, VecIt vend, vector<Tuple> &sums){
if (v==vend)
return;
int size = sums.size();
for (int i=0; i<size; i++) {
for (int j=1; j<4; j++){
sums.push_back(sums[i]);
tget(sums.back(),j) += *v;
}
tget(sums[i],0) += *v;
}
v++;
comp_sums(v, vend, sums);
}
void docase( ) {
int N;
cin >> N;
vector<int> lens(N);
int lsum = 0;
for (int i=0; i<N; i++) {
cin >> lens[i];
lsum += lens[i];
}
if (lsum % 4 != 0 ) {
cout << 0 << endl;
return;
}
lsum = lsum/4;
vector<Tuple> sums1, sums2;
Tuple z(0,0,0,0); // if it's a tuple
//Tuple z = {0,0,0,0}; // If it's an array
sums1.push_back(z); sums1.reserve(1<<N/2);
sums2.push_back(z); sums2.reserve(1<<N/2);
comp_sums(lens.begin()+1, lens.begin()+N/2, sums1);
comp_sums(lens.begin()+N/2+1, lens.end(), sums2);
sort(sums1.begin(), sums1.end());
sort(sums2.begin(), sums2.end());
long count = 0;
for (int i=0; i<sums1.size(); i++) {
for (int k=0; k<4; k++) {
//Tuple t({lsum,lsum,lsum,lsum}); // if it's an array
Tuple t(lsum,lsum,lsum,lsum); // if it's a tuple
for (int j=0; j<4; j++)
tget(t,j) -= tget(sums1[i],(j+k)%4);
tget(t,k) -= lens[0];
tget(t,0) -= lens[N/2];
bool neg = false;
for (int j=0; j<4; j++)
if (tget(t,j)<0){
neg = true;
break;
}
if (neg)
continue;
auto LB = lower_bound(sums2.begin(), sums2.end(), t);
auto UB = upper_bound(LB, sums2.end(), t);
count += (UB - LB);
}
}
cout << count/6 << endl;
}
int main() {
std::ios_base::sync_with_stdio(false);
int T;
cin >> T;
while (T--) docase();
}
Here is also a test input I use. The first line says there are T=18 separate test cases. First line for each test case is N (up to 20), and then N integers follow. Here it is. The algorithm has exponential growth so the small numbers don't mean it's only a few operations. Also in measuring time I have accounted for the I/O:
18
8
132391 123854 21536 19482 133025 10945 121800 10311
8
12 4 12 4 4 12 12 24
8
131723 16253 23309 132227 125171 12253 16757 136227
8
8 4 4 8 4 8 12 24
8
12 12 12 8 4 8 12 28
8
115021 114654 112093 17443 20371 17810 17274 115190
8
8 8 4 4 12 4 12 20
8
12 4 4 4 4 12 12 28
8
8 12 8 12 8 4 12 28
8
4 12 8 12 8 8 4 24
20
34836 56869 24112 27796 198091 196472 50364 27364 29572 53701 52981 25779 202644 204884 53840 30056 48705 53092 43751 18419
20
207447 26867 41968 35977 36384 57042 40981 30191 47705 26907 248361 26003 53715 48237 27691 192772 60507 58194 20754 245913
20
34836 56869 24112 27796 198091 196472 50364 27364 29572 53701 52981 25779 202644 204884 53840 30056 48705 53092 43751 18419
20
207447 26867 41968 35977 36384 57042 40981 30191 47705 26907 248361 26003 53715 48237 27691 192772 60507 58194 20754 245913
20
4 2 2 4 4 2 4 2 4 2 2 4 4 3 3 4 2 3 4 1
20
2 3 2 2 3 2 3 4 3 2 4 4 4 3 2 3 4 4 3 3
20
4 4 4 4 3 3 3 2 2 4 2 4 2 2 4 2 2 2 2 1
20
4 4 2 3 4 3 4 3 2 4 4 2 2 4 3 4 3 3 2 4