2

For my lexer I'm using the boost::wave lexical iterator which gives me all the tokens from a .cpp, .h .hpp etc. file.

Now I want to find if a set of tokens i.e. an identifier followed by open parenthesis and then set of arguments separated by comma and finally closed parenthesis, is a function in a C++ program. I mean how should I analyze the set of tokens to make sure I have a function?

I am trying to implement this using a recursive descent parser. Till now my recursive descent parser can parse arithmetic expressions and take care of almost all kinds of operator precedence.

Or is there a function (in boost::wave) which can directly parse a function for me?

Also it would be helpful if somebody can suggest how I can find the type variable in the function argument. e.g. if I have a function:

int myfun(char* c, T& t1) { //... }

then how can I get tokens of char and * which can be treated as type of c. Similarly tokens of T and & which can be treated as type of t1?

EDIT: Here is a little more explanation to my question

references:

the boost wave documentation

http://www.boost.org/doc/libs/1_47_0/libs/wave/index.html

list of token identifiers

http://www.boost.org/doc/libs/1_47_0/libs/wave/doc/token_ids.html

typedef boost::wave::cpplexer::lex_token<> token_type;
typedef boost::wave::cpplexer::lex_iterator<token_type> token_iterator;
typedef token_type::position_type position_type;

position_type pos(filename);

//instr is the input file stream
token_iterator  it = token_iterator(instr.begin(), instr.end(), pos,
      boost::wave::language_support(
        boost::wave::support_cpp|boost::wave::support_option_long_long));
token_iterator  end = token_iterator();

//while it != end 
//...
boost::wave::token_id id = boost::wave::token_id(*it);

switch(id){
//...

    case boost::wave::T_IDENTIFIER:
      Match(id);//consumes one token and increments the token_iterator
        //get the token id of the next token       
      id = boost::wave::token_id(*it);
 //if an identifier is immediately followed by T_LEFTPAREN then it will be a function
      if(id == boost::wave::T_LEFTPAREN) {
        Match(id);                         (1)
        //this function i want to implement
        ParseFunction();                   (2) 
        Match(boost::wave::T_RIGHTPAREN);
      }
//...
}

So the question is how to implement the function ParseFunction()

A. K.
  • 34,395
  • 15
  • 52
  • 89
  • 3
    You do realize that C++ is one of the hardest languages to parse correctly. – Martin York Jul 19 '11 at 23:44
  • My take is that the OP is not parsing C++. I suspect the OP has a limited grammar to parse (homework?), something akin to `bc`. – David Hammen Jul 19 '11 at 23:54
  • @Martin: I am not interested in parsing a lot of C++, just a few things to get me going. – A. K. Jul 20 '11 at 00:06
  • @David Hammen : not a homework. – A. K. Jul 20 '11 at 00:07
  • What, exactly are you trying to do? The question is a bit unclear as-is. Are you trying to build a parse tree? Build an expression evaluator? Build a scriptable interface to your program? Something else? Edit your question so we can help you. Otherwise we're just going to squabble about whether POSIX and C/C++ are in conflict with one another (which doesn't help you a bit). – David Hammen Jul 20 '11 at 01:13
  • @Martin: Though some parts of the C++ grammar are not too hard. – Sebastian Mach Jul 20 '11 at 06:28
  • @phresnel: You show me the simple part and I will show you the language that made it simpler. – Martin York Jul 20 '11 at 16:58
  • @Martin: not too hard != simple. Will you be suggesting D or [SPECS](http://www.csse.monash.edu.au/~damian/papers/HTML/ModestProposal.html)? – Sebastian Mach Jul 21 '11 at 05:57
  • @Martin: Of course it also depends on your target. For example, if you want just some intellisense-like function signature lookup, you can a long way with heuristical approaches that cover most cases. QtCreator does so, for example. – Sebastian Mach Jul 21 '11 at 06:00

1 Answers1

-1

If your system is POSIX-compliant (Linux, MacOSX, Solaris, ...) you can use dlopen/dlsym to determine whether the symbol exists. You need to watch out for name mangling, and on some systems you need to beware that [for example] the real name of sin is _sin.

Whether dlsym returns a pointer to a function or a pointer to some global variable — dlsym is clueless. In fact, you will have to do something that is very much contrary to both the C and C++ standards to use dlsym: you will have to cast the void* pointer returned by dlsym to a function pointer. The POSIX standard is in conflict with C/C++. That said, if you are on a POSIX-compliant system, those void* pointers will convert to a function pointer (otherwise the system is not POSIX-compliant).


Edit:

A huge gotcha: How do you call the thing you just found? How to you know how to handle the returned value, if there is any?

A simple example: suppose your input file contains xsq = pow (x, 2). You have to know ahead of time that the signature of pow is double pow (double, double).

Rather than using dlsym you are much better off handling a limited set of functions that you expressly build into your parser.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • 1
    The recommended way by POSIX to cast function pointer types falls within implementation defined behavior, [or so I believe](http://stackoverflow.com/questions/6734492/c-callback-to-function-template-explicitly-instantiate-template/6735302#6735302). – Luc Danton Jul 20 '11 at 00:04
  • @Luc: You believe incorrectly. POSIX assumes (requires!) that function pointers be the same size as data pointers. There are many machines that do not support this. Harvard architecture machines, for instance. Casting a data pointer to a function pointer is undefined behavior, pure and simple. See http://www.trilithium.com/johan/2004/12/problem-with-dlsym/ . – David Hammen Jul 20 '11 at 00:12
  • But the recommended way to 'cast' function pointer types is to precisely sidestep the issue and cast from pointer to pointer to function to `void*`. Have you read my link? – Luc Danton Jul 20 '11 at 00:17
  • @Luc: In fact, it is worse than UB. In C++ it is out-and-out illegal. C++ 2003: 5.4/5: "Any type conversion not mentioned below and not explicitly defined by the user (12.3) is ill-formed." There is no conversion from data pointer to function pointer, or vice versa. It is strictly forbidden by the C++ standard. – David Hammen Jul 20 '11 at 00:22
  • @Luc: Yes, I read your link. Did you read mine? (Second comment). THe POSIX standard is in conflict with both the C and C++ standards. You cannot legally cast a function pointer to/from void* in C or C++. Compilers on POSIX-compliant systems are faced with a lose-lose situation: If they truly are compliant with the C/C++ standards they cannot be POSIX-compliant, and if they are POSIX compliant they are in violation of the C/C++ standards. – David Hammen Jul 20 '11 at 00:30
  • I'm parsing a text (.cpp) file. Why would I need dlopen() etc. I tried to go through the description of dlopen(), still I'm not sure that I need dlopen(). – A. K. Jul 20 '11 at 00:31
  • @David What makes you think I did not already understand and know all this (and yes I did read the link)? What do you make of the [example code](http://pubs.opengroup.org/onlinepubs/009695399/functions/dlsym.html) from the specifications themselves? Can you see the difference between `void(*)()` and `void(**)()`? – Luc Danton Jul 20 '11 at 00:34
  • 1
    @David: In that quote, "type conversion mentioned below" includes `reinterpret_cast`. The specification for `reinterpret_cast` says: "Converting a function pointer to an object pointer type or vice versa is conditionally-supported. The meaning of such a conversion is implementation-defined..." – Ben Voigt Jul 20 '11 at 00:46
  • @Ben: You are reading a different standard than I am. I'm reading the 2003 standard, you are reading the draft C++0x standard. – David Hammen Jul 20 '11 at 00:53
  • 1
    @David: even in C++03, it's legal to `reinterpret_cast` from `void*` to `intptr_t` and from `intptr_t` to a function pointer. The result is implementation-defined, but such a program is not ill-formed. Therefore I don't believe that a compiler has to choose between POSIX compliance and C++ compliance. – Ben Voigt Jul 20 '11 at 00:54
  • 4
    @David: In addition, your whole answer seems completely irrelevant to the question. Nothing in the question suggests that the expression is evaluated, or in fact that the function is called with any parameters whatsoever. Finding function calls in C++ source code is useful for generating call trees, performing dead code identification, source browsing (think `ctags`), and so on. – Ben Voigt Jul 20 '11 at 00:59
  • Did you read the article to which I linked in comment number 2? Here's another, straight from WG21: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#195 . "In C the cast results in undefined behavior and thus does not require a diagnostic, and Unix C compilers generally do not issue one. This fact is used in the definition of the standard Unix function dlsym, which is declared to return void* but in fact may return either a pointer to a function or a pointer to an object. The fact that C++ compilers are required to issue a diagnostic is viewed as a "competitive disadvantage"." – David Hammen Jul 20 '11 at 01:04