I'm implementing a class to store time-series (OHLCV
) which will contain methods applied to parsed file. I'm trying to figure it out if there is a faster way to upload the content of each file (.csv
which are ≈ 40000 rows
) into a std::unordered_map<std::string, OHLCV>
. Knowing that the structure of the file is fixed (order of header):
.
├── file.csv
│
└── columns:
├── std::string datetime
├── float open
├── float high
├── float low
├── float close
└── float volume
The class is implemented as follows:
class OHLCV {
private:
const char* format = "%Y-%m-%d %H:%M:%S";
std::vector<long int> timestamps;
std::vector<float> opens, highs, lows, closes, volumes;
public:
void upload(
const std::string& filepath,
const char& sep,
const bool& header
)
{
std::ifstream stream(filepath);
if (stream) {
std::string line, timestamp, open, high, low, close, volume;
if (header) {
std::getline(stream, line);
}
while (std::getline(stream, line)) {
std::stringstream ss(line);
std::getline(ss, timestamp, sep);
std::getline(ss, open, sep);
std::getline(ss, high, sep);
std::getline(ss, low, sep);
std::getline(ss, close, sep);
std::getline(ss, volume, sep);
timestamps.emplace_back(timestamp);
opens.emplace_back(std::stof(open));
highs.emplace_back(std::stof(high));
lows.emplace_back(std::stof(low));
closes.emplace_back(std::stof(close));
volumes.emplace_back(std::stof(volume));
}
}
}
};
I tried to launch I bit of test to see how the OHLC::upload was performing with and these are some of the registred times:
[timer] ohlcv::upload ~ 338(ms), 338213700(ns)
[timer] ohlcv::upload ~ 329(ms), 329451900(ns)
[timer] ohlcv::upload ~ 345(ms), 345494100(ns)
[timer] ohlcv::upload ~ 328(ms), 328179800(ns)
Knowing that my optimization setting is currently at Maximum Optimization (Favor Speed) (/O2) and I'm testing in Release mode, could I improve the velocity of the upload without using an std::array
with a const unsigned int MAX_LEN
known at compile time?
Little note: Pandas (Python) takes ≈ 63ms
for uploading one of these files.