1

Is there a trick to be able to use file paths with spaces in Mallet through the terminal on mac?

For example, all of the following give me errors:

escaping the space

./bin/mallet import-dir  --input /Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

double quotes, no escapes

./bin/mallet import-dir --input "/Volumes/Macintosh HD/Users/MY_NAME/Desktop/en" --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

and, with double quotes

./bin/mallet import-dir --input "/Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en" --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

and finally with single quotes

./bin/mallet import-dir --input '/Volumes/Macintosh\ HD/Users/MY_NAME/Desktop/en' --output /Users/MY_NAME/Desktop/en.mallet --remove-stopwords TRUE --keep-sequence TRUE

They all want to treat the folder as multiple folders, split on the space:

Labels = 
   /Volumes/Macintosh\
   HD/Users/MY_NAME/Desktop/en
Exception in thread "main" java.lang.IllegalArgumentException: /Volumes/Macintosh\ is not a directory.
    at cc.mallet.pipe.iterator.FileIterator.<init>(FileIterator.java:108)
    at cc.mallet.pipe.iterator.FileIterator.<init>(FileIterator.java:145)
    at cc.mallet.classify.tui.Text2Vectors.main(Text2Vectors.java:322)

Is there anyway around this, other than renaming all of my files with spaces to underscores? (I understand that I don't need to type /Volumes/Macintosh\ HD/... but can just start at /Users. This was just an example.)

bigfoot56
  • 71
  • 1
  • 8

2 Answers2

3

The issue is that import-dir is designed to take multiple directories as input. The argument parser would need a way to distinguish this use case from the "escaped space" use case, keeping in mind that Windows paths can end in \.

The best way to support both cases might be to add a --single-input option that would take its argument as a single string.

I also find that the spreadsheet-style import-file command is almost always preferable to working with directories.

David Mimno
  • 1,836
  • 7
  • 7
1

As a work around you could:

(1) write some code to read the directory contents and generate a single examples file for use with:

bin/mallet input-file

Here's the mallet quick-start page for importing which describes the input-file version: http://mallet.cs.umass.edu/import.php

(2) Generate a symbolic link to the folder in a location without any spaces in it

Edward Ross
  • 2,614
  • 1
  • 18
  • 17