0

I am trying to write up a config parser class in c++. I'll first give a snippet of my class:

class foo{
private:
    struct st{
        std::vector<pair<std::string,std::string>> dvec;
        std::string dname;
    }
    std::vector<st> svec;
public:
    //methods of the class
}

int main(){
    foo f;
    //populate foo
}

I will populate the vectors with data from a file. The file has some text with delimiters. I'll break up the text into strings using the delimiters. I know the exact size of the file and since I am keeping all data as character string, it's safe to assume the vector svec will take the same memory space as the file size. However, I don't know how many strings there will be. e.g., I know the file size is 100 bytes but I don't know if it's 1 string of 100 characters or 10 strings of 10 characters each or whatever.

I would like to avoid reallocation as much as possible. But std::vector.reserve() and std::vector.resize() both allocate memory in terms of number of elements. And this is my problem, I don't know how many elements there will be. I just know how many bytes it will need. I dug around a bit but couldn't find anything.

I am guessing I will be cursed if I try this- std::vector<st> *svec = (std::vector<st> *) malloc(filesize);

Is there any way to reserve memory for vector in terms of bytes instead of number of elements? Or some other workaround?

Thank you for your time.

Edit: I have already written the entire code and it's working. I am just looking for ways to optimize it. The entire code is too long so I will give the repository link for anyone interested- https://github.com/Rakib1503052/Config_parser

For the relevant part of the code:

class Config {
private:
    struct section {
        std::string sec_name;
        std::vector<std::pair<std::string, std::string>> sec_data;
        section(std::string name)
        {
            this->sec_name = name;
            //this->sec_data = data;
        }
    };
    //std::string path;
    std::vector<section> m_config;
    std::map<std::string, size_t> section_map;

public:
    void parse_config(std::string);
    //other methods
};

void Config::parse_config(string path)
{
    struct stat buffer;
    bool file_exists = (stat(path.c_str(), &buffer) == 0);

    if (!file_exists) { throw exception("File does not exist in the given path."); }
    else {
        ifstream FILE;
        FILE.open(path);

        if (!FILE.is_open()) { throw exception("File could not be opened."); }

        string line_buffer, key, value;;
        char ignore_char;
        size_t current_pos = 0;

        //getline(FILE, line_buffer);

        while (getline(FILE, line_buffer))
        {
            if (line_buffer == "") { continue; }
            else{
                if ((line_buffer.front() == '[') && (line_buffer.back() == ']'))
                {
                    line_buffer.erase(0, 1);
                    line_buffer.erase(line_buffer.size() - 1);
                    this->m_config.push_back(section(line_buffer));
                    current_pos = m_config.size() - 1;
                    section_map[line_buffer] = current_pos;
                }

                else
                {
                    stringstream buffer_stream(line_buffer);
                    buffer_stream >> key >> ignore_char >> value;
                    this->m_config[current_pos].sec_data.push_back(field(key, value));
                }
            }
        }

        FILE.close();
    }
}

It reads an INI file of the format

[section1]
key1 = value1
key2 = value2

[section2]
key1 = value1
key2 = value2
.
.
.

However, after some more digging I found out that std::string works differently than I thought. Simply put, the strings themselves are not in the vector. The vector only holds pointers to the strings. This makes my case moot.

I'll keep the question here for anyone interested. Especially, if the data is changed to unary types like int or double, the question stands and it has a great answer here- https://stackoverflow.com/a/18991810/11673912

Feel free to share other opinions.

Abdur Rakib
  • 336
  • 1
  • 12
  • I’m not sure I understand the usage scenario, and so I cannot judge your data structure. Are you making a lookup table, like environment strings/INI file stuff? – Dúthomhas May 06 '22 at 03:18
  • 2
    sounds like what you want is a custom allocator. With one you can request the number of bytes up front in the allocators constructor, and then vector and string can use that allocator for there allocations so there is no actual memory request as the allocator already sis that. – NathanOliver May 06 '22 at 03:19
  • You're likely overthinking it. Finish your task by inserting items line by line. After you are done, the means to optimize insertion into the vector will be more obvious. I suspect you will need to count the number of lines in the file first before you reverse an item count. – selbie May 06 '22 at 03:19
  • 1
    `std::string` itself is a fixed size class. So how much memory is used in your vector depends only on how many elements it has. – QuarticCat May 06 '22 at 03:20
  • 1
    And for the parsing purpose, I guess you might need `std::string_view` to avoid unnecessary memory allocation and copy. – QuarticCat May 06 '22 at 03:21
  • 1
    "it's safe to assume the vector svec will take the same memory space as the file size." Um, I don't see why this is a safe assumption. It will almost certainly require more space than the file size. – Raymond Chen May 06 '22 at 04:55
  • "*it's safe to assume the vector svec will take the same memory space as the file size*" - that is incorrect. The memory size requirement of the `vector` is dependant only on the number of elements it holds. And the memory size requirement of each `std::string` object is separate from the memory size requirement of the data each string holds. – Remy Lebeau May 06 '22 at 09:18
  • @Dúthomhas, Yes, a lookup table from INI file. – Abdur Rakib May 08 '22 at 06:56
  • @NathanOliver, That would be a good thing. But my knowledge of allocator is very limited. Could you point to an easy to understand sample or documentation about custom allocator? – Abdur Rakib May 08 '22 at 06:58
  • @selbie, The code was already finished and working. I was looking for possible optimization. – Abdur Rakib May 08 '22 at 06:58
  • @RaymondChen, @RemyLebeau, Since the overhead is constant, I am excluding it for the moment. In application a reasonable number added to the filesize should do. Something like `filesize+1024`. – Abdur Rakib May 08 '22 at 07:01
  • The overhead is not a constant. It depends on how many keys there are in the file. What you can do is have a vector of pointers to the raw data. – Raymond Chen May 08 '22 at 14:02

0 Answers0