-6

To solve my problem here, I want to know if/how I can define the second variable of the command line arguments in a format other than char** argv or char* argv[]. The reason is that pybind11 doesn't allow either of those in the inputs of a function. Here are the methods I have tried:

Method 1:

#include <stdio.h>

int main(int argc, int* argv_){
    for (int i = 0; i < argc; ++i){
        printf("%s\n", (char *)(argv_[i]));
    }
}

The rationale behind this method is that a pointer is intrinsically an integer and by casting the address to a char pointer, one should be able to get the strings. Thanks for your kind support in advance.

Method 2:

#include <stdio.h>
#include <string>

int main(int argc, std::string* argv_){
    for (int i = 0; i < argc; ++i){
        printf("%s\n", argv_[i].c_str());
    }
}

Method 3:

#include <stdio.h>
#include <string>
#include <vector>

int main(int argc, std::vector<std::string> argv_){
    for (int i = 0; i < argc; ++i){
        const char* argv__ = argv_[i].c_str();
        printf("%s\n", argv_[i].c_str());
    }
}

issue:

Unfortunately, all of the above methods lead to the infamous segmentation fault.

I would appreciate it if you could help me know what is the problem (i.e., where is the memory leak) and how to solve them.

workaround/hack:

In the comments I'm being told that if any other form rather than main(), main(int argc, char** argv), or main(int argc, char* argv[]) is used, it will unavoidably lead to segmentation fault. However, the code below works:

#include <stdio.h>

int main(int argc, long* argv_){
    for (int i = 0; i < argc; ++i){
        printf("%s\n", (char *)(argv_[i]));
    }
}

This works on an Ubuntu minimal and g++ 7.4.0, and Windows 10 Visual Studio 2019 compilers. However, it does not compile with clang. As others have pointed out this is not a solution and a very bad practice. It can cause undefined behavior depending on the compiler, operating system and the current state of the memory. This should not be used in any actual code ever. The main function in any C/C++ code must be of the forms main(), main(int argc, char** argv), or main(int argc, char* argv[]).

Community
  • 1
  • 1
Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193
  • dear @kaylum I have intentionally included both `C` and `C++` tags. As you may see in the question I have used standard libraries of both languages. It is a mixed code. – Foad S. Farimani Feb 03 '20 at 22:23
  • 5
    I'm confused. Nothing should be calling `main`, so this question doesn't make sense to me – Mooing Duck Feb 03 '20 at 22:23
  • 3
    No, you can define `main` as `int main();` or `int main(int, char**);` or `int main(int, char* []);`. Sometimes a third environment array exists too. You should probably not pybind `main` http://eel.is/c++draft/basic.start.main#3.sentence-1 – Ted Lyngmo Feb 03 '20 at 22:23
  • 4
    @Foad: There isn't really such a thing as mixed code. You're compiling this in C++ clearly, since you're using C++ classes. – Mooing Duck Feb 03 '20 at 22:24
  • 1
    https://en.cppreference.com/w/cpp/language/main_function – Thomas Sablik Feb 03 '20 at 22:26
  • @MooingDuck this is obviously a [mcve](https://stackoverflow.com/help/minimal-reproducible-example). in the actual case, the function is not main and it will be called as an object inside the python script, using pybind11. – Foad S. Farimani Feb 03 '20 at 22:28
  • Dear @MooingDuck thanks for the edits, but I would appreciate if you would have consulted me before applying them. Now the above codes don't have a `main` function and they do not satisfy as minimal, complete and verifiable example. – Foad S. Farimani Feb 03 '20 at 22:33
  • 4
    @Foad: They weren't a minimal complete an verifiable example anyway, since there wasn't enough code there to verify the problem. The use of `main` also added a large amount of confusion, as noted in all the comments, and all answers. If `main` is not the method in question, then the confusion goes away. – Mooing Duck Feb 03 '20 at 22:38
  • Geez, what a mess. Sorry for the rollback, @MooingDuck was right of course and I shouldn't have reverted. – DevSolar Feb 03 '20 at 22:42
  • Are you trying to bind main to pybind? What is your actual problem? The other answers have stated the valid forms of main. – Indiana Kernick Feb 03 '20 at 22:44
  • Dear @Kerndog73 The main issue has been explained [here](https://stackoverflow.com/q/60040665/4999991) and [here](https://stackoverflow.com/a/60039323/4999991). But I didn't include pybind11 because to avoid confusion. – Foad S. Farimani Feb 03 '20 at 22:47
  • 3
    @Foad, I don't think you *should* bind `main` to `pybind`. The `main` function is pretty special, and the C++ program may make assumptions (like registering functions with `atexit`) which your Python caller would not, could not, satisfy. – DevSolar Feb 03 '20 at 22:48
  • I wish my fellow SO users were a little bit more patient. editing my post so fast and downvoting is not that helpful. I have asked help to find the source of segmentation fault. – Foad S. Farimani Feb 03 '20 at 22:49
  • You're getting a seg fault because you're using non-standard forms of main. The deleted answers explained this in detail. – Indiana Kernick Feb 03 '20 at 22:50
  • 3
    @Foad: XY problem... `main` has a `char **` second parameter, which is not supported by `pybind`, so don't try to put a square peg through a round hole. Don't try to bind `main` (making your would-be MCVE not an MCVE), and use [supported types](https://pybind11.readthedocs.io/en/stable/advanced/cast/index.html) as per the docs. – DevSolar Feb 03 '20 at 22:50
  • 1
    Ted's answer seems to be what you're looking for. – Indiana Kernick Feb 03 '20 at 22:51
  • @DevSolar I don't have to bind `main` and in reality, the name will be for sure different. But to make the questions small and contained I have intentionally tried to keep pybind11 out of the picture. The only reason O mentioned it was because in the pervious question I was asked why I can't have double pointers in the code. – Foad S. Farimani Feb 03 '20 at 22:51
  • @Foad: Well, don't use double pointers in your C++ code. They should hardly be necessary, and they are not supported by `pybind`, as you've been told... – DevSolar Feb 03 '20 at 22:52
  • @Kerndog73 Ted's answer is wrong because I have clearly mentioned that nowhere in the code I can have `char**` variable definitions and the inputs in the functions can't be of the type `char* var[]` either. – Foad S. Farimani Feb 03 '20 at 22:53
  • 3
    @Foad: Stop, think, and read the documentation I linked, which includes examples for `std::string` and `std::vector`. As soon as it's not `main` you want to link, the requirement for `char**` goes away. – DevSolar Feb 03 '20 at 22:54
  • What is the reason for this restriction? – Indiana Kernick Feb 03 '20 at 22:54
  • 1
    @DevSolar [here in this question](https://stackoverflow.com/q/60040665/4999991) I have explained why I have to have them. basically there is a closed source library with a function that asks for those variables. I have found a bad workaround [here](https://stackoverflow.com/a/60039323/4999991). But I'm looking for a more canonical solution. – Foad S. Farimani Feb 03 '20 at 22:55
  • 1
    @Kerndog73 as I have explained [here](https://stackoverflow.com/a/60039323/4999991), pybind11 does not support double pointers anywhere in the code and `char* var[]` for the function inputs. more info [here](https://github.com/pybind/pybind11/issues/417#issuecomment-248150544). – Foad S. Farimani Feb 03 '20 at 22:57
  • Foad, I fear this question is beyond repair (and the others are, as well). The title is utterly misleading, and look at the mess we have as a result, with edits, reverts, and all the comments. You need to focus, and stop talking about "the command line arguments" when it's not `main` you're linking and simply about how to pass an array of strings from Python to a function taking an array of C strings. You keep bringing in restrictions that simply aren't there... – DevSolar Feb 03 '20 at 23:00
  • 3
    The linked github issue says that you can't bind a function with a `char **` to pybind. It doesn't say that they can't appear anywhere in the code! Pybind is only a library (just plain old C++), it can't impose that sort of restriction on you. From Ted's answer, you can bind `cppmain` to pybind just fine. If you need to call it from main, you can. – Indiana Kernick Feb 03 '20 at 23:00
  • Dear @Kerndog73 I have tested both `char** argv` and `char* argv[]` in the body and in the function inputs. AFIK neither work in the functionn inputs, and `char**` type doesn't compile anywhere in the code. – Foad S. Farimani Feb 03 '20 at 23:03
  • @Foad But, you are not pybinding to the actual executable that has the `main` function in it, are you? – Ted Lyngmo Feb 03 '20 at 23:04
  • @DevSolar I would appreciate it if you would put pybind11 out of the picture for a moment. My question is about the memory leak because I want to find a workaround to define the second argument in a different format than what is standard. – Foad S. Farimani Feb 03 '20 at 23:04
  • @TedLyngmo In reality not. but I would appreciate if you could help me find an answer to the question, regardless of the pybind11. – Foad S. Farimani Feb 03 '20 at 23:05
  • Well, can you change it so the function name isn't `main`? You've already gotten answers to why that will never work. – Ted Lyngmo Feb 03 '20 at 23:06
  • 4
    (ctd.) Since you aren't going to call `main` at all (as you stated), that segfault problem is *completely unrelated* to what you're actually trying to achieve; so it's not productive to ask for a "solution" for it. – DevSolar Feb 03 '20 at 23:08
  • Dear @DevSolar I understand that you want to help me, and I really appreciate that. But I would appreciate it if we could focus on the memory leak, which is the main core of the question. – Foad S. Farimani Feb 03 '20 at 23:09
  • 2
    @Foad As your question is currently stated, the problem is that the function has the wrong signature. – Ted Lyngmo Feb 03 '20 at 23:10
  • 1
    @Foad: ...**but unrelated to your actual problem**, which is how to call that closed-source library function from Python. The solution is through a marshalling function that converts from what pybind supports to what your library supports. Look up the definition of "[XY problem](https://en.wikipedia.org/wiki/XY_problem)", I'm not referencing it by accident. – DevSolar Feb 03 '20 at 23:10
  • 3
    The main function can be `int main()`, `int main(int, char **)` or `int main(int, char*[])`. If you try something else, you'll get a seg fault. **There is no way around that!** If that's all your question was about (clearly it isn't), you would have accepted an answer already. – Indiana Kernick Feb 03 '20 at 23:11
  • @Kerndog73 OK. Now that is a good answer. I wish someone would have explained that from the beginning. I would appreciate it if you could provide me references to read more about this. Thanks in advance. – Foad S. Farimani Feb 03 '20 at 23:13
  • 3
    @Foad: It is *literally* what I wrote in my answer (including reference), which you commented on that you were "aware of the standard"... – DevSolar Feb 03 '20 at 23:14
  • There are plenty of comments and two deleted answers explaining what I just explained. – Indiana Kernick Feb 03 '20 at 23:14
  • @DevSolar I'm not sure if I have noticed that. And it is unfortunately removed now. – Foad S. Farimani Feb 03 '20 at 23:15
  • @Kerndog73 I don't have access to the deleted answers. – Foad S. Farimani Feb 03 '20 at 23:16
  • 2
    @Foad: The main problem here (pun intended) is that you made this question about one thing (`main` and its parameters), but were actually looking for the answer to another thing entirely. Which is also why this question got downvoted, and closed. (And why I am voting to delete it, because it will not be helpful to anyone, and can't be salvaged IMHO.) – DevSolar Feb 03 '20 at 23:18
  • @Kerndog73 I provided a counter example in the P.S. How do you think about that? – Foad S. Farimani Feb 03 '20 at 23:39
  • 2
    @Foad "_I wish someone would have explained that from the beginning_" - Look at the third comment from the top. It tells you what signatures `main` can have **and** it links to the draft of the upcoming C++20 standard. – Ted Lyngmo Feb 03 '20 at 23:57
  • @Kerndog73 I'm using a minimal Ubuntu, with g++ 7.4.0. It is not nice to call my environment crappy sir. – Foad S. Farimani Feb 04 '20 at 00:02
  • @TedLyngmo I actually looked at that page. There are no mentions of `segmentation fault`. I am aware of the correct/standard form of the command line arguments. The question is how one can hack around it. – Foad S. Farimani Feb 04 '20 at 00:03
  • 4
    Using a different set of arguments from that defined by the standard is invoking *undefined behavior*. You will not find the term "segmentation fault" in the standard because a segmentation fault is a mechanism by the OS to protect against invalid memory accesses, as could be triggered by undefined behavior -- but there is no guarantee whatsoever that this, or any other defined error handling, will happen -- because the behavior is **undefined**. And you **don't** "hack around it", you use a proper solution to your **actual** problem, which has nothing to do with `main` or segfaults... – DevSolar Feb 04 '20 at 00:12
  • 1
    The segmentation fault is one possible outcome of having a program with undefined behaviour. What you get in `argv` is an array of pointers. Each pointer points at a null terminated C string. If you try to read the array as something it isn't you will get problems. – Ted Lyngmo Feb 04 '20 at 00:12
  • 3
    And "but it works" is just one of the possible outcomes of undefined behavior, if you *happen* to do something that's not defined by the language specs but doesn't result in *immediate* desaster. This "working" can stop on a different machine, a different compiler, or even just a different compiler option. It might just *look* like it's "working", and croak on you later on with no hint as to why or how. And a "successful" compiler run doesn't mean that the code will do anything meaningful, or even reproducable, at runtime. *Do not invoke undefined behavior.* – DevSolar Feb 04 '20 at 00:15
  • Dear @DevSolar you were right. `int` was not big enough for pointers. I used `long`. Now I'm able to print out the first charchters of the input arguments. – Foad S. Farimani Feb 04 '20 at 00:19
  • 3
    You are still (willfully?) ignoring several glaring core issues of your code, your approach to coding, and your approach to handling this Q&A format. C++ is very much **not** a trial and error language. – DevSolar Feb 04 '20 at 00:21
  • Dear @DevSolar Thanks for your kind support. I actually found the workaround I was looking for. You may check the last edits above. – Foad S. Farimani Feb 04 '20 at 00:27
  • How did that solve anything? You supply a pointer variable of the wrong type and cast it to the correct type afterwards. Will that let you pybind to this function? – Ted Lyngmo Feb 04 '20 at 00:32
  • 2
    Even if it "lets" you bind that function, that code is broken in more than one way. If you enable compiler warnings, your compiler will probably tell you some of them. – DevSolar Feb 04 '20 at 00:35
  • @TedLyngmo It is not a good solution. But it helped me understand that there is a workaround for this limitation, at least with gcc. Plus it is slightly better than the workaround I had before. – Foad S. Farimani Feb 04 '20 at 00:35
  • @DevSolar It is clearly not a good solution, but a workaround. I do get compiler warnings already. – Foad S. Farimani Feb 04 '20 at 00:36
  • 2
    Why not use the supported types in the function you are going to pybind? Why are you continuing to use `main` in your question? What will this hack that is likely to blow up in your face really solve? – Ted Lyngmo Feb 04 '20 at 00:37
  • 1
    More like a failaround. You have been given lots of sound advice in this comment section, and eventually decided to dismiss *all* of it. This does not bode well for your future coding endeavours. – DevSolar Feb 04 '20 at 00:38
  • @TedLyngmo fact 1. I have a closed sourced function which requires `int argc, char* argv[]` fact 2. I can't have either `char** argv` nor `char* argv[]` in the inputs of the functions fact 3. I can't have double pointers anywhere in the code. – Foad S. Farimani Feb 04 '20 at 00:41
  • 4
    @Foad: Then pibind to a function that _calls_ the closed source function. Problem solved. – Mooing Duck Feb 04 '20 at 00:55
  • 1
    Fact 4: You should not call `main`. Fact 5: If the function in question isn't `main`, why not make a wrapper function in C++ that takes types that pybind is happy to bind with and then create the `char*[]` and pass it to the final function. – Ted Lyngmo Feb 04 '20 at 00:56
  • 1
    @Foad I added a wrapper example too. – Ted Lyngmo Feb 04 '20 at 01:17
  • 3
    "Facts" 2 and 3 are imprecise. You cannot **pybind** to a `char**` function. That does, however, not mean that you "cannot have `char**` anywhere in your code", as Ted had the patience to showcase. – DevSolar Feb 04 '20 at 06:13
  • @DevSolar thanks for the time you spent on this question. to stop this unhealthy discussion I tried to delete the post however it doesn't allow me to delete it. The main cause of this misunderstanding is that you guys are trying to convince me what I need. instead of answering my main question. I did not ask for a pybind11 solution. I have asked questions about that before and I have had a couple of good answers since yesterday. Here I wanted to know if/how I can hack around the standard. This would not only work as a temporary workaround but also help me understand the standards... – Foad S. Farimani Feb 04 '20 at 08:37
  • best practices, and compiler/platform-specific undefined behaviors. However, at this moment I think at this moment the discussions here are not very professional. I wish you all the best. – Foad S. Farimani Feb 04 '20 at 08:38
  • 5
    The only point that's unprofessional is that you are ignoring / not acknowledging the answers you're given. You don't "hack around the standard", full stop. Accept, if you will, that the ones answering you actually *do* know what we're talking about, and *do* understand your problem -- even a bit better than you do. – DevSolar Feb 04 '20 at 09:06
  • 3
    Do you understand what an [XY problem](https://meta.stackexchange.com/a/66378) is? It seems like you're so caught up in the Y problem that you won't accept solutions to the X problem. – Sneaky Turtle Feb 04 '20 at 09:34
  • @SneakyTurtle I'm absolutely aware of that term. But it doesn't mean I totally agree with it. In fact, many people who invoke that terminology want to convince the person what their problem really is, instead of answering their main question first. For example, here, some are trying to convince me `segmentation fault` is unavoidable, which I proved wrong. Finding the root problem is fine as far it does not include, bullying, diminishing and ridicule. I have asked pybind11 specific questions and I linked to them for those who are kind enough and willing to answer them. – Foad S. Farimani Feb 04 '20 at 09:38
  • 2
    The seg fault is because it's undefined behaviour for the main function to have any other parameters than `main()` or `main(int, char**)`. If you use any other signature, you're invoking [undefined behaviour](https://en.cppreference.com/w/cpp/language/ub) so anything could happen. By "anything", I mean it might appear to work, or maybe it seg faults, or maybe it does something completely different. So you may be able to write something that appears to work but that **certainly doesn't mean that you should**. By the looks of things, this has been mentioned many times before. – Sneaky Turtle Feb 04 '20 at 09:49
  • 2
    Both the X and the Y problem have been solved but you aren't accepting either solution. – Indiana Kernick Feb 04 '20 at 09:53
  • @SneakyTurtle I'm not quite sure if the segmentation fault is here due to undefined behavior. I think there is actual memory leakage that could be investigated and fixed. Both GCC and visual studio accept this syntax returning reliable results. Clang, however, does not compile. Eventually, I don't think anyone here claims that this is a good practice or what I mentioned is a solution. This could be just considered as an informative Q&A so everyone involved could understand the uncharted corners of the language and different compilers. – Foad S. Farimani Feb 04 '20 at 12:40
  • 1
    **It's undefined behaviour.** – Sneaky Turtle Feb 04 '20 at 13:02
  • 1
    **Definitely undefined behaviour** – Indiana Kernick Feb 04 '20 at 13:03
  • @SneakyTurtle I have no doubt it can cause undefined behaviour depending to the compiler, operating system and the memory state. What I have proposed as hack is a bad solution and should not be used in any actual code under any circumstances. – Foad S. Farimani Feb 04 '20 at 13:04
  • @SneakyTurtle because you don't know what you don't know. When I started this question, I had done some tests, and GCC would compile my code with some warnings. So I assumed this could be at least a temporary solution. also, I wanted to know why specifically the segmentation fault happens and where is the memory leak. I learned a lot here, and I also found a "hack" that actually works on both GCC and visual studio. and I also know this should not be used in any actual code. this could be usefull for people who will end up here. – Foad S. Farimani Feb 04 '20 at 13:11

2 Answers2

6

It doesn't look like it needs to be main after all, so you could do like this:

#include <iostream>
#include <string>
#include <vector>

int cppmain(std::string program, std::vector<std::string> args) {
    std::cout << program << " got arguments:\n";
    for(auto& arg : args) {
        std::cout << " " << arg << "\n";
    }
    return 0;
}

int main(int argc, char* argv[]) {
    // create a string from the program name and a vector of strings from the arguments
    return cppmain(argv[0], {argv + 1, argv + argc});
}

In case you need to call a closed source main-like function (that you can not change), create a wrapper function that you can pybind to and let that function call the closed source function.

#include <cstddef>
#include <iostream>
#include <string>
#include <vector>

int closed_source_function(int argc, char* argv[]) {
    for(int i = 0; i < argc; ++i) {
        std::cout << argv[i] << '\n';
    }
    return 0;
}

int pybind_to_this(std::vector<std::string> args) {
    // create a char*[]
    std::vector<char*> argv(args.size() + 1);

    // make the pointers point to the C strings in the std::strings in the
    // std::vector
    for(size_t i = 0; i < args.size(); ++i) {
        argv[i] = args[i].data();
    }

    // add a terminating nullptr (main wants that, so perhaps the closed source
    // function wants it too)
    argv[args.size()] = nullptr;

    // call the closed source function
    return closed_source_function(static_cast<int>(args.size()), argv.data());
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • unfortunately, other users have edited my question without consulting me in advance causing this confusion. the `char* arg[]` and `char** argv` can not be in any of the function's inputs. – Foad S. Farimani Feb 03 '20 at 22:37
  • 3
    They are not. The input to the `cppmain` function is a `std::string` and a `std::vector`. – Ted Lyngmo Feb 03 '20 at 22:38
  • 1
    @Foad The change to the question is why this is the only answer that's _right_ – Mooing Duck Feb 03 '20 at 22:41
5

Let's try to tackle the plethora of issues that have cropped up during the lengthy discussion, one by one.


Question 1: Why do I get a segfault when using some non-standard parameters (like string vector or int pointer) to main?

The parameter types of int, char ** are defined that way by both the C and the C++ standard. Non-standard extensions aside, you cannot use other types.

From ISO/IEC 9899 (The C Language), 5.1.2.2.1 Program startup:

The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:

int main(void) { /* ... */ }

or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):

int main(int argc, char *argv[]) { /* ... */ }

or equivalent; or in some other implementation-defined manner.

That last sentence allows for those extensions I mentioned. One such extension I know of is GCC's environ:

https://www.gnu.org/software/libc/manual/html_node/Program-Arguments.html#Program-Arguments


Question 2: How do I hack around this?

You don't.

Using different types than those defined by the standard, or by compiler extensions, is Undefined Behavior, which can -- but does not need to -- lead to segfaults. Do not invoke undefined behavior. Do not "hack around" the standard. It is not a "workaround", let alone a "solution", it is broken code that can blow up in your face any time.


Question 3: How do I pybind a third-party function that takes a char ** as parameter?

You don't, as this is not a datatype supported by pybind.


Question 4: How do I interface such a function through pybind, then?

You write a wrapper function that, on the front end, takes parameters supported by pybind (e.g. std::vector< std::string >), appropriately marshals those, and then calls the third-party backend function for you with the marshalled arguments. (Then, of course, doing the same in reverse for the return type, if required.)

For an idiomatic example on how to do that, see the answer by @TedLyngmo.


Question 5: Can I pybind to a third-party main?

This is ill-advised, as main is a special function, and the called code may make assumptions (like atexit callbacks) that your calling code does not, and can not, comply with. It is certainly not a function the third party ever expected to be called as a library function.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • Thanks for the post. I'm aware of the standard, but I'm looking for a way to go around it if possible. basically, if I can store the inputs in any type other then `char* argv[]` or `char** argv` and then cast them into those forms, the problem should be solved. – Foad S. Farimani Feb 03 '20 at 22:35
  • @Foad: See updated answer for a quick way to get `argv` into a vector of strings. That's really the best you can do. – DevSolar Feb 03 '20 at 22:38
  • @DevSolar: The confusion is because Foad is binding a random method to pybind, and this question has nothing to do with `main`. – Mooing Duck Feb 03 '20 at 22:39
  • 2
    @Foad: Completely reworked the answer. Please drop a note if any part of your question (and comments) remains unanswered. – DevSolar Feb 04 '20 at 10:35
  • @DevSolar sure. a bit busy now. But I will study it an dwill give you feedback. – Foad S. Farimani Feb 04 '20 at 12:25