1

I have this program I am writing that cleans up a xml file and adds new lines to a settings section from a txt file. Part of it I have a section labeled // Part in my code. It is during or after that section, either is fine, I would like to compare the lines to make sure they are not duplicated but ignore their setting in this case True and False and consider them identical if one is set to true and the other set to false and only keep the second line and discard the first. Here is an example of how the settings look:

    <setting1>true</setting1>
    <setting2blue>false</setting2blue>
    <setting3>true</setting3>
    <setting1>false</setting1>
    <setting4>true</setting4>
    <setting2blue>true</setting2blue>

So in the end I would like the first setting 1 to be removed and the second setting 1 to stay and same thing for setting 2. Keep in mind this is an example as the settings have different names and sometimes contain the same words.

I've tried to use .compare but got really lost as I am still very new to C++. I even though that I might need to do a new in stream and out stream and then compare after my previous work was done but I am still getting hung up on how to compare.

I appreciate any help.

Thanks, Vendetto

Here is part of the program I broke out to test in without having to run the whole thing.

#include <stdio.h>
#include <fstream>
#include <sstream>
#include <iostream>
#include <string>
#include <cctype>
#include <cstdlib>
#include <set>
#include <vector>
#include <algorithm>
#include <cassert>
#include <Windows.h>
using namespace std;


bool isSpace(unsigned char c) {
    return ( c == '\r' ||
        c == '\t' || c == '\v' || c == '\f');
}


int main()
{


    const string Dir{ "C:/synergyii/config/" };
    ifstream in_config{ Dir + "clientconfig.xml" },
        in_newlines{ Dir + "newlines.txt" };
    ofstream out{ Dir +  "cltesting.txt" };


    vector<string> vlines31;
    vector<string> vlines32;
    set<string>    slines31;
    set<string>    slines32;


    for (string line31; getline(in_config, line31); vlines31.push_back(line31))
        if (line31.find("<settings>") != string::npos) {
            vlines31.push_back(line31);
            break;
        }


    for (const auto& v : vlines31)
        out << v << '\n';


    // <settings> Part
    
    for (string line32; getline(in_config, line32) && line32.find("</settings>") == string::npos; ) {
        line32.erase(remove_if(line32.begin(), line32.end(), isSpace), line32.end());
        line32.erase(line32.find_last_not_of(" ") + 1);
        const auto& result = slines32.insert(line32);
        if (result.second)
            vlines32.push_back(line32);
    }


    for (string line32; getline(in_newlines, line32);) {
        line32.erase(remove_if(line32.begin(), line32.end(), isSpace), line32.end());
        const auto& result = slines32.insert(line32);
        if (result.second)
            vlines32.push_back(line32);
    }


    vlines32.erase(unique(vlines32.begin(), vlines32.end()), vlines32.end() );


    for (auto it = vlines32.cbegin(); it != vlines32.cend(); ++it)
        out << '\t' << '\t' << *it << '\n';


    out << '\t' << "</settings>\n";
    out << "</config>\n";


    in_config.close();
    out.close();
}
Vendetto
  • 25
  • 6

1 Answers1

2

A note about XML first:

XML allows formatting which doesn't necessarily change the meaning of its contents. Beside of indentation, an element might be written in one line or spread over multiple lines. It's even allowed to write the whole XML file in one line (assuming there are no newlines in the element's contents like in OPs case).

Reading XML correctly with C++ standard I/O is more complicated than a few std::getline()s. To do it right, an XML library should be used to read the XML file into a DOM to do the intended processing.
E.g. SO: What XML parser should I use in C++? provides an overview about available XML libraries.


That being said, I want to demonstrate a possible solution for OPs question but using another even simpler config. format – key value pairs separated by a colon (:).

How to filter out duplicated keys:

The solution is actually simple:
The whole file is read line by line into a vector of strings.
If a line contains a key the key is stored in a look-up table.
If the key was already in that look-up table the previous occurrence (line) is remarked as invalid. To keep it simple, I just clear the line. If empty lines may be valid contents (which shall be kept in file) something else should be used to remark the line e.g. an extra bool stored with each line.
I didn't consider removal of lines as an option because this would invalidate the stored line indices of all keys for following lines (or I had to iterate through the look-up table to fix them).

Demo:

#include <iostream>
#include <map>
#include <sstream>
#include <string>
#include <vector>

std::vector<std::string> lines;

using LookUpTable = std::map<std::string, size_t>;

LookUpTable lut;

std::istream& readLine(std::istream &in)
{
  std::string line; if (!std::getline(in, line)) return in;
  const size_t iLine = lines.size();
  // extract key
  const size_t i = line.find(':');
  if (i < line.size()) { // Has the line a key at all?
    std::string key = line.substr(0, i);
    // look whether there was already this setting
    const LookUpTable::iterator iter = lut.find(key);
    if (iter != lut.end()) { // Was it already there?
      // clear previous line
      lines[iter->second].clear();
    }
    // store key and line index
    lut.emplace(std::move(key), iLine);
  }
  // store line in lines buffer
  lines.push_back(std::move(line));
  // done
  return in;
}

void readFile(std::istream &in)
{
  while (readLine(in));
}

void writeFile(std::ostream &out)
{
  for (const std::string line : lines) {
    // skip empty lines
    if (line.empty()) continue;
    // write non-empty lines
    out << line << '\n';
  }
}

int main()
{
  std::string sample = R"(# sample config file
setting1: true
setting2blue: false
setting3: true
setting1: false
setting4: true
setting2blue: true
)";
  // read the sample
  { std::istringstream in(sample);
    readFile(in);
  }
  // write the sample (with clean-up)
  std::cout << "Output:\n";
  writeFile(std::cout);
}

Output:

Config.:
# sample config file
setting3: true
setting1: false
setting4: true
setting2blue: true

Live Demo on coliru

Nit-picking:

An unordered map may provide a possible even-faster look up than a map. It may pay for this with a possible higher memory foot-print. I doubt that this difference is essential for the task but with a minimal change, it works with an unordered_map as well:

Live Demo on coliru

Scheff's Cat
  • 19,528
  • 6
  • 28
  • 56