2

I have string in a python script which contains some java code.
How can I extract base java class name from it in order to execute it using subprocess?
I think it can achieved using regex, but I don't know how.

Sample:

a = """
import java.util.Scanner;
class sample{}
class second
{
    static boolean check_prime(int a)
    {
        int c=0;
        for (int i=1;i<=a; i++) {
            if(a%i==0)
                c++;
        }
        if(c == 2)
            return true;
        else
            return false;
    }
    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);
        System.out.println("Enter two numbers");
        int a = in.nextInt();
        int b = in.nextInt();
        if(check_prime(a) && check_prime(b))
        {
            if(b-a==2 || a-b==2)
                System.out.println("They are twin primes");
            else
                System.out.println("They are not twin primes");
        }
        else
            System.out.println("They might not be prime numbers");
    }
}
"""

5 Answers5

2

A main class is a class which contains the public static void main function.

If it is possible in your environment; you could use a library that can parse Java source code such as plyj or javalang:

#!/usr/bin/env python
import javalang # $ pip install javalang

tree = javalang.parse.parse(java_source)
name = next(klass.name for klass in tree.types
            if isinstance(klass, javalang.tree.ClassDeclaration)
            for m in klass.methods
            if m.name == 'main' and m.modifiers.issuperset({'public', 'static'}))
# -> 'second'

If there is a package declaration e.g., package your_package; at the top of the Java source i.e., if the full class name is your_package.second then you could get the package name as tree.package.name.

Or you could use a parser generator such as grako and specify a Java grammar subset that is enough to get the class name in your case. If the input is highly regular; you could try a regex and expect it to fail if your assumptions about the structure of the code are wrong.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • this seems to work, but I was wondering if it isn't too overkill. Also is it fast enough, like regex ? – user2444327 Oct 29 '15 at 18:01
  • @user2444327: it depends. Measure time performance and see whether it is fast enough in your case. It might be an overkill but if there is no restrictions on dependencies then it is easier to add `javalang` to your `requirements.txt`, use the code that I've provided and forget about it. On the other hand if the input is simple then write a simple regex and expand it if necessary on a case by case basis. – jfs Oct 29 '15 at 18:07
1

Using only regex is hardly ever going to work. As a basic example of why it could not, consider this:

public class A {
     public static void ImDoingThisToMessYouUp () {
          String s = "public static void main (String[] args) {}";
     }
}

public class B {
      public static void main (String[] args) {}
}

You get the idea... Regex could always be fooled into believing they found something which isn't really what you are looking for. You must rely on more advanced libraries for parsing.

I'd go with J.F. Sebastian's answer.

spalac24
  • 1,076
  • 1
  • 7
  • 16
0

Here's a crude way:

import re

b = a.split()
str = b[b.index("class")+1]
javaclass = re.sub("{.*$","",str)
print (javaclass)

...which essentially takes all the words, and find the first word after the first occurrence of "class". It also removes "{" and anything after it, if you have a situation like

class MyClass{

However you would need to do a lot more if you have multiple classes in a file.

ergonaut
  • 6,929
  • 1
  • 17
  • 47
  • The whitespace is optional after the class name, as this is perfectly legal and common `class MyClass{ ....` so your method will fail in a substantial number of cases. – Brent C Oct 29 '15 at 17:17
0

As I said in comment, use re.findall() like this:

re.findall('class (\w*)', a)

As the function name, findall() can find all of the class names. And use \w here will match all ascii letters(will be better than .* if you're using class MyClass{).


About find the main class, use re.S like this:

for i in re.split('\nclass ', a)[1:]:                      # will match the main code block and the class name of all classes
    if re.search('\n\s*public static void main', i):              # check if 'public static void main' in a class
        print(re.search('(\w*)', i).group(1))       # and print out the class name

A more simple way, only one line use list comprehension:

[re.search('(\w*)', i).group(1) for i in re.split('\nclass ', a) if re.search('\n\s*public static void main', i)]
Remi Guan
  • 21,506
  • 17
  • 64
  • 87
  • Sorry about I've just leave :) – Remi Guan Oct 29 '15 at 17:46
  • It's not working, if I print `i` inside the loop it prints the whole code. – user2444327 Oct 29 '15 at 17:55
  • @user2444327 Don't print `i`. As I said in my answer, `print(re.search('class (\w*)', i).group(1))`. – Remi Guan Oct 29 '15 at 17:56
  • I was printing i with it. – user2444327 Oct 29 '15 at 17:59
  • Really? Let me test again. – Remi Guan Oct 29 '15 at 18:01
  • Well, I've tested it in Python 2.7 and Python 3.5 and it works fine. What about the list comprehension solution? – Remi Guan Oct 29 '15 at 18:03
  • I have updated the string, can you check it against that one, also try to place the sample class after the second class. – user2444327 Oct 29 '15 at 18:04
  • Oops, I understand the problem now. My solution thinks that the `second` class is inside the `sample` class. Let me fix it. – Remi Guan Oct 29 '15 at 18:08
  • I don't think this really always works... The par of `{.*}` is greedy, so it will match the first `{` it encounters with the last, thus always printing the first class in the file and finding nothing more. Using `.*?` instead also won't work because it will match the first `}` which could be a function closing. In general, you can't really do parentheses balancing with regex, so you'll have to rely on more advanced parsing options. – spalac24 Oct 29 '15 at 18:16
  • @Santiago That's the problem, so I'm trying to use `.split('class')` now. And now this can works. – Remi Guan Oct 29 '15 at 18:18
  • @KevinGuan not really, since the word `class` can be found anywhere inside the code. E.g. `String cl = "class";`. – spalac24 Oct 29 '15 at 18:19
  • Especially if someone uses -> `String cl = "class sample{public static void main}";` – user2444327 Oct 29 '15 at 18:23
0

An approximate solution to the problem is possible with regular expressions, as you guessed. However, there are a few tricks to keep in mind:

  1. A class name may not terminate with whitespace, since MyClass{ is legal and common
  2. A type parameter can be provided after the classname such as MyClass<T> and the compiled .class file's name will not be effected by this type parameter
  3. A file may have more than one top level class, however one must not be declared public and this class cannot have the same name as the file
  4. The public class that has the same name as the file may have inner class (which may even be public) but these must necessarily come after the outer class declaration.

These tips lead to searching for the first occurrence of the phrase public class, capturing the next run of characters, then looking for whitespace, a { or < character.

This is what I came up with (may be a bit ugly): public\s*(?:abstract?)?\s*(?:static?)?\s*(?:final?)?\s*(?:strictfp?)?\s*class\s*(\w.*)\s*,?<.*$

Brent C
  • 833
  • 1
  • 9
  • 15
  • it is not necessary for the class to be public (as the code example in the question shows), you could even have [multiple classes with `public static main()` method](http://stackoverflow.com/a/2324915) – jfs Oct 29 '15 at 19:21
  • @J.F.Sebastian yes, that is true. But how would you execute that file using `subprocess` because there would be no way to obtain the filename from that source code? The filename will always be the same as the public class in that file, and executing that file will cause `main` to be called regardless of which class it is in. – Brent C Nov 03 '15 at 15:34
  • click the link. It explicitly shows the example how you can execute the code in such cases. Anyway, the source is not in any file in OPs case. – jfs Nov 03 '15 at 15:37
  • @J.F.Sebastian How would you execute that Java code using `subprocess` if it isn't in a file? – Brent C Nov 03 '15 at 15:47
  • e.g., save it to a file. I don't see how your comments are related to my first comment in anyway. Could you elaborate how exactly is it related to *"it is not necessary for the class to be public (as the code example in the question shows), you could even have multiple classes with public static main() method"*? – jfs Nov 03 '15 at 15:54