0

I'm dealing with an application, that needs to read in 10+ CSV-files (of different kind) as input. The data is read into a container -- std::map or vector.

Previously each kind had its own parsing function, and I'm working on unifying that into a single templatized function: to have save on future code-maintenance and provide uniform error-reporting for broken files.

This function reads each line, discerns, whether the container type has the concept of a key (like map) and uses emplace for those, and emplace_back for the others (like vector).

The only expectation from the value class of the container is that its constructor can instantiate from a CSV-line. Any exception by the constructor is a fatal error -- the input filename and line-number are reported, and the program exits:

try {
        if constexpr (is_associative_container<Container>(NULL)) {
                result->emplace(typename Container::key_type(
                    key, keylen), value);
        } else {
                result->emplace_back(value);
        }
} catch (const std::exception &e) {
        fprintf(stderr, "%s:%zd: %s\n", path, line, e.what());
        goto failure;
}

This all works and I'm happy -- about 75% of the CSV-parsing is now done by this function.

I'm now facing the remaining quarter of the CSV-files, which are less straightforward: because certain rows in them require special treatment and their content isn't supposed to be stored in the container.

How can the value_type's constructor signal to the function, that the exception it is throwing should not considered fatal? One way is to pick one of the standard exceptions (std::bad_function_call?) as a signal, but that'd mean, the picked exception mustn't occur unexpectedly -- which is unreliable...

Anything else?

Jarod42
  • 203,559
  • 14
  • 181
  • 302
Mikhail T.
  • 3,043
  • 3
  • 29
  • 46
  • `throw(...)` was removed from C++ with C++17 (C++20 for `throw()`). There is no typed exception specification anymore. A function can either potentially throw any type as exception or can't throw at all. – user17732522 Aug 25 '23 at 15:58
  • Aside: why is `is_associative_container` a function that takes a pointer, and not a trait? – Caleth Aug 25 '23 at 16:03
  • It's impossible even in old C++. A dynamic exception specification is not considered part of a function’s type. – 273K Aug 25 '23 at 16:03
  • 1
    You can convert an unlisted exception to your custom exception. If the function throws an exception of the type not listed in its exception specification, the function `std::unexpected` is called. The default function calls `std::terminate`, but it may be replaced by a user-provided function (via `std::set_unexpected`) which may call `std::terminate` or throw an exception. – 273K Aug 25 '23 at 16:06
  • Are you in control of the types that are stored in these containers, or do they come from third parties? – Caleth Aug 25 '23 at 16:07
  • Wow, no more `throw`-declaration? Thanks, I edited the question... – Mikhail T. Aug 25 '23 at 16:07
  • @Caleth, I have some control over the `value_type` -- I can modify their constructors, for example -- but I'd like there to be the fewest limitations possible. My implementation for the `is_associative_container` is based on [this](https://stackoverflow.com/questions/21660325/) and I admit, I don't fully understand it myself :) – Mikhail T. Aug 25 '23 at 16:09
  • 2
    `struct non_fatal_error : std::except { };` and `struct fatal_error {};`. Use the appropriate one in the `throw`. – Eljay Aug 25 '23 at 16:15
  • You're suggesting a custom exception type, @Eljay, but that means, the `value_type` code has to include my header -- in order to know, what to throw in the non-fatal case. I'd like for them to stand independent -- thus preference for standard exceptions. – Mikhail T. Aug 25 '23 at 16:19
  • 1
    i dont really understand the motivation. When a constructor throws then it didnt construct an object. You mean construction of this object may fail but you want to continue with parsing the remainder of the csv? (and currently one exception makes you skip the remainder) – 463035818_is_not_an_ai Aug 25 '23 at 16:24
  • Yes, currently any exception is treated as fatal -- the input CSV-file is misformatted in some way, and the application must abort. I now wish to extend the "protocol" to allow the `value_type` constructor to signal, that, although this particular line should not result in a new object-creation (and its emplacement in the container), the other rows should still be read in. – Mikhail T. Aug 25 '23 at 16:27
  • @MikhailT.: The function should take another parameter: `boolean shouldInsertIntoContainer(RowData)` – Mooing Duck Aug 25 '23 at 16:28
  • @MooingDuck, that will generally mean, parsing each row _twice_ -- once to determine, whether to insert it, and then again to actually insert it :) – Mikhail T. Aug 25 '23 at 16:31
  • @MikhailT.: No, you parse the row once, then pass it to the function, and then if the function returns true, then you insert the *same object* into the container. – Mooing Duck Aug 25 '23 at 16:31
  • The row-parsing is specific to each `value_type` -- and thus should happen in the context of that class, whatever it is. The vast majority of rows result in a proper object, so I want to continue to use `emplace` to store those objects in the container from the beginning. _Except_ those few, that are "special". – Mikhail T. Aug 25 '23 at 16:36
  • the code to parse the line and type with the constructor is written by someone else? Then let them write the code to insert into the container. Pass them a reference to the container and let them do whatever they need to do. Only they can know under what conditions a line must be skipped or inserted – 463035818_is_not_an_ai Aug 25 '23 at 16:37
  • `emplace` constructs the object directly in the container -- which is why we like it. I don't want to construct each object elsewhere first, and then only copy into the container _if_ that hypothetical special function approves. – Mikhail T. Aug 25 '23 at 16:39
  • 1
    There we go: "The row-parsing is specific to each value_type -- and thus should happen in the context of that class, whatever it is." replace "row-parsing" with "error handling and decision to insert or not" and you have the solution I just suggested. – 463035818_is_not_an_ai Aug 25 '23 at 16:39
  • you must misunderstand something. I dont think anybody suggested to make such copies. At least I didnt. I am suggesting to move the decision to insert or not into the type. The code you are showing in the question doesnt need to know whether an object was inserted. It only need to get `key, keylen, value` from somewhere and pass it elsewhere. – 463035818_is_not_an_ai Aug 25 '23 at 16:41
  • The hypothetical function you're suggesting needs to parse the `value`. This parsing would be type-specific (so it'd need to be a static method of the `value_type` class), and will likely have to be _repeated_ by the actual constructor (non-static) of the type again. I don't want that repetition. – Mikhail T. Aug 25 '23 at 16:45

3 Answers3

2

A question edit overlapped with writing this answer. The original answer is below.

How can the value_type's constructor signal to the function, that the exception it is throwing should not considered fatal?

Note that in your current code you already do make a distinction between exceptions inherited from std::exception and those that do not inherit from std::exception. Good style is to inherit all exceptions from std::exception but reality is usually different.

You could introduce some special type of exception to be thrown:

            try {
                 //...
            } catch (const non_fatal_exception &e) {
                    // do something
            } catch (...) { // all other exceptions are "fatal"
                    fprintf(stderr, "%s:%zd: %s\n", path, line, e.what());
                    goto failure;
            }

Old answer about dynamic exception specifications...

As mentiond in comments, dynamic exception specifications are removed from C++17.

Before C++17 a function that did throw an exception not listed in the exception specification did the following (from cppreference):

If the function throws an exception of the type not listed in its exception specification, the function std::unexpected is called. The default function calls std::terminate, but it may be replaced by a user-provided function (via std::set_unexpected) which may call std::terminate or throw an exception. If the exception thrown from std::unexpected is accepted by the exception specification, stack unwinding continues as usual. If it isn't, but std::bad_exception is allowed by the exception specification, std::bad_exception is thrown. Otherwise, std::terminate is called.

There is no way out unless you know some exception that would be accepted from the exception specification, but in general you do not know that. I am not aware of deducing the "allowed" exceptions in generic code. Anyhow, the feature is removed. And in some sense it was already "fatal" to throw an exception not listed in the exception specification, its not something you have to do extra.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • There is no `e` -- and thus, no `e.what()` -- in the `catch (...)` case. But yeah, this seems like the general approach. The `non_fatal_exception` type can be simply `const char *` :-) – Mikhail T. Aug 25 '23 at 16:51
0

If it's normal that the row shouldn't go into the container, then that's not an exceptional flow, and you shouldn't be using exceptions. The function should take another parameter: Function<bool(RowData)> shouldInsertIntoContainer

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • These special rows still need parsing -- for example, to modify the static-members of the container's `value_type`. They just mustn't result in a new object being stored in the container. – Mikhail T. Aug 25 '23 at 16:32
  • @MikhailT.: not an issue. You need to parse it anyway in order to pass it to `shouldInsertIntoContainer` in the first place. – Mooing Duck Aug 25 '23 at 16:48
  • Yes, it'll need to be parsed by the static `shouldInsertIntoContainer()` first, and then _again_ by the constructor -- during the actual insertion. The repetition I'd like to void. – Mikhail T. Aug 25 '23 at 16:53
  • @MikhailT.: Why can't you just parse it once? That sounds like there's an additional complexity you hadn't mentioned. – Mooing Duck Aug 25 '23 at 17:00
  • The parsing currently creates the object of type `Container::value_type`. You're suggesting, I parse it _before_ constructing this object -- where would the parsing-results live, while the decision is being made on whether the object should be constructed? – Mikhail T. Aug 25 '23 at 17:07
  • @MikhailT.: I don't know your code. Either (A) You can pre-parse to a temp object, pass it to `shouldInsertIntoContainer`, and then construct` value_type` from the temp object and append, or (B) You can parse a `value_type`, then pass that to `shouldInsertIntoContainer`, and then append it. Depends on if the type can be constructed but not inserted or not. – Mooing Duck Aug 25 '23 at 17:18
  • Only the (B) can possibly work, because the CSV-loading function has no idea, what `value_type` needs. So the object must be constructed -- temporarily -- and then examined and stored in the container depending on the results of the examination. That'd work, of course, but will also defeat the purpose of `emplace` -- tens of thousands of objects would have to be copied into the container after being constructed instead of being constructed in the container directly, all to allow for 5-10 exceptional rows to avoid being `emplace`d into same... – Mikhail T. Aug 25 '23 at 17:24
  • 1
    @MikhailT.: move them into the container instead of copying. – Mooing Duck Aug 25 '23 at 17:47
  • The "moving" in RAM consists of copying from source to destination, followed by freeing the source :-) But, even if it didn't -- it still wouldn't have been free... – Mikhail T. Aug 25 '23 at 18:27
  • 1
    @MikhailT. If you're moving plain-old-data, then nothing needs freeing. If it's not plain-old-data, then you can just move a couple pointers and still not free anything. Your comments are baffling to me. – Mooing Duck Aug 25 '23 at 19:34
  • The "nothing needs freeing" is baffling, indeed. If a `char a[1024]` is stored in a container, what happens to the original? How does the memory become available for other uses? Of course, it needs to be marked as free somehow -- either by a literal call to `free()` or something else. Bottom line is, the "moving" is not free -- indeed, it is simply a copying... – Mikhail T. Aug 25 '23 at 19:44
  • if it's a `char a[1024]`, then it's automatic scope, and freeing is pretty much a no-op. But you're right that the move is inconveniently expensive for that case. I hadn't imagined your elements were that large and also auto-scoped. Anything that needs `free()` is also cheap to move. – Mooing Duck Aug 25 '23 at 19:47
  • Whatever the object's size, point remains -- ["moving" an area in RAM is simply a copying](https://man.freebsd.org/cgi/man.cgi?query=memmove&sektion=3) (possibly followed by some time of freeing). Further, whether you think, such moving of an existing object into a container is cheap or expensive, it costs _something_, and `emplace` was designed and implemented to avoid that expense. – Mikhail T. Aug 25 '23 at 20:06
  • 1
    @MikhailT.: If it's dynamically scoped, then "moving" is simply a single pointer write. You're right that it's not free, but it's pretty darn close to it. – Mooing Duck Aug 25 '23 at 20:38
0

Ok, based on my own inclination -- and the suggestions by @Eljay (comment) and @463035818_is_not_an_ai (accepted answer), I modified the code thus:

        try {
            if constexpr (is_associative_container<Container>(NULL)) {
                result->emplace(typename Container::key_type(
                    key, keylen), value);
            } else {
                result->emplace_back(value);
            }
        } catch (const std::string &message) {
            /*
             * Allow constructors to fail without aborting the
             * parsing of the file by throwing an std::string.
             * If the string is not empty, output it to stderr.
             */
            if (!message.empty())
                fprintf(stderr, "%s:%zd: %s\n", path, line,
                    message.c_str());
            continue;
        } catch (const std::exception &e) {
            fprintf(stderr, "%s:%zd: %s\n", path, line, e.what());
            goto failure;
        } catch (...) {
            fprintf(stderr, "%s:%zd: internal error while "
                "emplacing into %s\n",
                path, line, typeid(Container).name());
            goto failure;
        }
Mikhail T.
  • 3,043
  • 3
  • 29
  • 46
  • 1
    You shouldn't use any existing class for this (including `std::string`). The only guarantee you have that a particular class isn't thrown by library code is if it's a class you defined yourself. Make your own class inheriting `std::exception`, you'll get `e.what()` in the bargain, and have the parsers use your specific class for resumable failures. – Ben Voigt Aug 25 '23 at 17:38
  • Thanks, @BenVoigt, but this is a minor -- _cosmetic_ -- point. Our classes may throw own exceptions (all derived from std::exception) -- or those thrown by the standard library code, which are also proper. The remote chance of there some day being some code throwing a `std::string` unexpectedly does not outweigh the inconvenience of requiring these classes to `#include` the declaration of the _special_ exception-class... – Mikhail T. Aug 25 '23 at 18:31
  • @MikhailT. I agree that a conflict is unlikely, but I would argue you should still create your own exception class because of readability. If I see something throwing a string, it will surprise me and I'll have to dig through the code to find out what it means, because the type doesn't give me any information. If you threw an instance of, say, `RecoverableParsingError`, it would be clear and memorable immediately. Also, while there are exceptions (har har), it is generally good practice to only throw std::exception derived things. – Wutz Aug 25 '23 at 22:50
  • There is also Occam and his famous Razor... – Mikhail T. Aug 26 '23 at 01:50