1

I'm writing a Python script to parse Java classes from a backend service in order to extract necessary information. One of the things I need is to extract the request parameter from a java method.

public\s+[\w<>\[\]]+\s+(\w+)\s*\(([^{]+)(^(@ApiParam(.*)\))|^(@PathParam(.*))|^(@QueryParam(.*))|(@\w+\s+)?)((\w+)\s+(\w+))

Is what I got so far... It already gives me the method parameters in the brackets () however I cannot filter out the @ApiParam and @QueryParam annotations.

/*Some annotations*/
public PortfolioSuggestion calculatePortfolioSuggestion(
        @ApiParam(required = true,
                  value = "Request containing the answers which were answered by the user and an\n" +
                          "investment for which suggestion should be calculated")
        @Valid @NotNull PortfolioSuggestionRequest portfolioSuggestionRequest,
        @ApiParam(value = "The optional product key")
        @QueryParam("product") ProductType productType)
        throws SuggestionsCalculationException {

The request parameter is always the first parameter which is not annotated with @ApiParam or @QueryParam. In this case my target would be PortfolioSuggestionRequest and portfolioSuggestionRequest. The annotations @Valid and @NotNull are always optional and could be omitted

Chau362
  • 37
  • 5

1 Answers1

3

TL;DR : Regexp are not powerful enough for your usecase

Any regexp is equivalent to a Deterministic finite automaton.

Regexps are not always suited to parse code. It sometimes requires to have a Pushdown automaton which regexp doesn't provide. You might want to look into ANTLR which is full feature language parser.

See this question for a similar example.

Here is some github repo aiming at parsing java using ANTLR.

Matthias Beaupère
  • 1,731
  • 2
  • 17
  • 44