0

I'm trying to use Cloud AI Platform for training (gcloud ai-platform jobs submit training). I created my bucket and am sure the training file is there (gsutil ls gs://sat3_0_bucket/data/train_input.csv).

However, my job is failing with log messsage:

File "/root/.local/lib/python3.7/site-packages/ktrain/text/data.py", line 175, in texts_from_csv
    with open(train_filepath, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'gs://sat3_0_bucket/data/train_input.csv'

Am I missing something?

1 Answers1

0

The error is probably happening because ktrain tries to auto-detect the character encoding using open(train_filepath, 'rb') which may be problematic with Google Cloud Storage. One solution is to explicitly provide the encoding to texts_from_csv as an argument so this step is skipped (default is None, which means auto-detect).

Alternatively, you can read the data in yourself as a pandas DataFrame using one of these methods. For instance, pandas evidently supports GCS, so you can simply do this: df = pd.read_csv('gs://bucket/your_path.csv')

Then, using ktrain, you can use ktrain.text.texts_from_df (or ktrain.text.texts_from_array) to load and preprocess your data.

blustax
  • 449
  • 3
  • 4
  • Thank you so much! Your suggestion worked, but it also required "gcsfs" to be installed via setup.py. Error was: `ImportError: Missing optional dependency 'gcsfs'. The gcsfs library is required to handle GCS files Use pip or conda to install gcsfs.` Pandas: `gcsfs: necessary for Google Cloud Storage access (gcsfs >= 0.1.0).` – Jasmine Thomson Jun 20 '20 at 01:38