I'm trying to parallelize a code. But I noticed a strange behavior in C++. I simplified the problem to the following: I have a huge array (100M bytes). When I write random data on this data in a single thread it is very faster than running in parallel (for example 10 core). I assume that by considering the RAM speed which is more than 1GB/s, there should not be any problem in parallel write on RAM. The code is like this:
#include <iostream>
#include <type_traits>
#include <stdio.h>
#include <stdlib.h>
#include <cstring>
#include <chrono>
#include <thread>
using namespace std;
uint8_t g[16]{1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 10, 1};
uint8_t** data = new uint8_t*[1000];
void test() {
for (int i = 1; i < 100000000; i++) {
int row = rand() % 1000;
int col = rand() % 10000000;
memcpy(&data[row][col], &g[0], 16);
memcpy(&data[row][col + 16], &g[0], 16);
}
}
#define TH 1
int main() {
for (int i = 0; i < 1000; i++) {
data[i] = new uint8_t[10000000];
}
std::chrono::time_point<std::chrono::high_resolution_clock> m_beg = std::chrono::high_resolution_clock::now();
std::thread* workers = new std::thread[TH];
for (int i = 0; i < TH; i++) {
workers[i] = std::thread(&test);
}
for (int i = 0; i < TH; i++) {
workers[i].join();
}
double t = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::high_resolution_clock::now() - m_beg).count();
cout << t << endl;
}
I compared to settings:
1-TH=1 , test loop counter=100M
2-TH=10, test loop counter=10M
and the result is as follow:
1-10 seconds
2-72 seconds
does anyone have any idea what is the reason?