1

UPDATE: this problem occurs when running JVM-based command line tools in a cygwin bash shell. Although I originally thought this was related to Scala, it's specific to the Windows JVM. It might be the result of breaking changes in MSDN libraries, see comments below.

I'm writing a scala utility script that takes a literal java classpath entry and analyzes it. I'd like my main method to be able to receive command line arguments with a trailing asterisk, e.g, "/*", but there seems to be no way to do it when running in a cygwin bash session.

Here's my scala test script, which displays command line arguments:

# saved to a file called "dumpargs.sc"
args.foreach { printf("[%s]\n",_) }

I'd like to be able to call it with an asterisk as an argument, like this:

scala -howtorun:script dumpargs.sc "*"

When I run this in a CMD.EXE shell, it does what I expect:

c:\cygwin> scala.bat -howtorun:script dumpargs.sc "*"
arg[*]
c:\cygwin>

Likewise, when tested in a Linux bash shell, the sole command line argument consists of a single bare asterisk, again as expected.

A comparable command-line args dumper program written in C prints a single bare asterisk, regardless of which shell it is run from (CMD.EXE or bash).

But when the same test is run in a cygwin bash shell, the asterisk is globbed, listing all the files in the current directory. The globbing happens somewhere downstream from by bash, since otherwise, the C dumper program would have also failed.

The problem is subtle, it happens somewhere in the JVM after it receives the asterisk argument and before the JVM calls the main method. But the JVM only globs the asterisk based on something in the running shell environment.

In some ways, this behaviour is a good thing, since it supports script-portability, by hiding differences in the runtime environments, Windows versus Linux/OSX, etc (unix-like shells tend to glob, whereas CMD.EXE doesn't).

All efforts to work around the problem so far have failed:

Even if I'm allow for os-dependent tricks, I've tried all of the following (from a bash session):

"*" '*' '\*' '\\*'

The following almost works, but the half-quotes arrive as part of the argument value and must then been stripped away by my program:

"'*'"

Same problem, but different kind of unwanted quotes get through:

'"*"' or \"*\"

What's needed is a system property, or some other mechanism to disable globbing.

By the way, one variation of this problem is the inability to take advantage of the nice way a directory of jar files can be added to the classpath (since java 1.6), by specifying "-classpath 'lib/*'".

There needs to be a system property I can set to disable this behavior when running in a shell environment that provide its' own globbing.

philwalk
  • 634
  • 1
  • 7
  • 15
  • The problem goes away if testing in a Linux session. Also, when calling scala.bat from a CMD.EXE session, the asterisk can be double quoted to avoid the problem. This may indicate an opportunity to fix this in the bash scala script, since scala.bat can be made to work. – philwalk May 04 '16 at 21:06
  • see https://msdn.microsoft.com/en-US/library/ms235497(v=VS.80).aspx – philwalk May 06 '16 at 20:00
  • This may be the source of the bug: https://bugs.openjdk.java.net/browse/JDK-7167744 although It refers to java 1.7 and the problem still exists in 1.8 latest as of May 2016. – philwalk May 06 '16 at 20:01
  • 1
    See this for more info: http://stackoverflow.com/questions/25948706 – philwalk May 06 '16 at 20:22
  • As of java 1.8.0_52, it is now possible to specify "-classpath 'lib\\*'" although it's necessary to use a backslash (perhaps this works in earlier java versions, I haven't tested that). It's odd because java command line accepts forward slashes for classpaths in general, just not in this specific case. – philwalk Dec 06 '17 at 18:01

1 Answers1

0

This problem is caused by a known bug in the JVM, documented here:

https://bugs.openjdk.java.net/browse/JDK-8131329

In the meantime, to get around the problem, I'm passing arguments via an environment variable.

Here's what happens inside my "myScalaScript":

#!/usr/bin/env scala
for( arg <- args.toList ::: cpArgs ){
  printf("[%s]\n",arg)
}

lazy val cpArgs = System.getenv("CP_ARGS") match {
  case null => Nil
  case text => text.split("[;|]+").toList
}

Here's how the script is invoked from bash: CP_ARGS=".|./lib/*" myScalaScript [possibly other-non-problematic-args]

and here's what it prints in all tested environments:

[.]
[./lib/*]

Here's a better fix, that hides all the nastiness inside the script, and is a bit more conventional in the main loop.

The new script:

#!/bin/bash
export CP_ARGS="$@"
exec $(which scala) "$0"
!#
// vim: ft=scala

for( arg <- cpArgs ){
  printf("[%s]\n",arg)
}

lazy val cpArgs = System.getenv("CP_ARGS") match {
  case null => Nil
  case text => text.split("[;|]+").toList
}
philwalk
  • 634
  • 1
  • 7
  • 15
  • because this is a workaround rather than a fix, I'll wait for a (hopefully) better answer. – philwalk May 06 '16 at 20:49
  • BTW, I consider the jvm bug to be quite severe, as it prevents the use of the "all jars below a directory" classpath shortcut, e.g., -cp "./lib/*" in a windows bash shell. – philwalk May 06 '16 at 21:05
  • "all jars below a directory" idiom verified to work in Windows with java 1.8.0_52 (possibly earlier) but only if backslash is used. Forward slash works in windows java for specifying classpath in general, but not for this particular idiom. – philwalk Dec 06 '17 at 18:03