24

So my problem is that a have a few files not showing up in gcsfuse when mounted. I see them in the online console and if I 'ls' with gsutils. Also, if If I manually create the folder in the bucket, i then can see the files inside it, but I need to create it first. Any suggestions?

gs://mybucket/ dir1/ ok.txt dir2 lafu.txt

If I mount mybucket with gcsfuse and do 'ls' it only returns dir1/ok.txt. Then I'll create the folder dir2 inside dir1 at the root of the mounting point, and suddenly 'lafu.txt' shows up.

cupcakearmy
  • 377
  • 1
  • 2
  • 9
  • 1
    What incredibly odd behavior. Sure enough, after I re-created the three layers of parent directories by hand, the last layer had my file inside. Poor form, Google. :/ – Kyle Baker Apr 02 '18 at 00:31

3 Answers3

36

By default, gcsfuse won't show a directory "implicitly" defined by a file with a slash in its name. For example if your bucket contains an object named dir/foo.txt, you won't be able to find it unless there is also an object nameddir/.

You can work around this by setting the --implicit-dirs flag, but there are good reasons why this is not the default. See the documentation for more information.

jacobsa
  • 5,719
  • 1
  • 28
  • 60
  • Thank you very much!! This is what i was searching for. Latency is not that big of a problem, so this solves everything :) – cupcakearmy Jul 12 '16 at 09:24
  • Done :) I didn't know that was a thing. (My first stack overflow question) – cupcakearmy Jul 14 '16 at 08:02
  • 4
    I appreciate the documentation link explanation, but that's still a questionable UI. Perhaps detection of 'invisible' 'directories' leading to a notification pointing to the appropriate documentation (or suggestion of the `--implicit-dirs` flag) would be appropriate. I shouldn't have to waste an hour of my time trying to figure out what's going on. – Kyle Baker Apr 02 '18 at 00:35
  • 2
    I file this one under "things I would have never ever ever solved without Stack Overflow" ;) – Matt Fletcher Jan 01 '20 at 20:32
5

Google Cloud Storage doesn't have folders. The various interfaces use different tricks to pretend that folders exist, but ultimately there's just an object whose name contains a bunch of slashes. For example, "pictures/january/0001.jpg" is the full name of a single object.

If you need to be sure that a "folder" exists, put an object inside it.

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
  • Thanks for the clarification, already helps. I think I didn't explain myself to well then, I'll modify the question. – cupcakearmy Jul 11 '16 at 21:54
0

@Brandon Yarbrough suggests creating needed directory entries in the GCS bucket. This avoids the performance penalty described by @jacobsa.

Here is a bash script for doing so:

# 1.  Mount $BUCKET_NAME at $MOUNT_PT
# 2.  Run this script
MOUNT_PT=${1:-HOME/mnt}
BUCKET_NAME=$2
DEL_OUTFILE=${3:-y}    # Set to y or n

echo "Reading objects in $BUCKET_NAME"
OUTFILE=dir_names.txt
gsutil ls -r gs://$BUCKET_NAME/** | while read BUCKET_OBJ
do   
    dirname "$BUCKET_OBJ"
done | sort -u > $OUTFILE
echo "Processing directories found"
cat $OUTFILE | while read DIR_NAME
do
    LOCAL_DIR=`echo "$DIR_NAME" | sed "s=gs://$BUCKET_NAME/==" | sed "s=gs://$BUCKET_NAME=="`
    #echo $LOCAL_DIR
    TARG_DIR="$MOUNT_PT/$LOCAL_DIR"
    if ! [ -d "$TARG_DIR" ]
    then
        echo "Creating $TARG_DIR"
        mkdir -p "$TARG_DIR"
    fi
done
if [ $DEL_OUTFILE = "y" ]
then
    rm $OUTFILE
fi
echo "Process complete"

I wrote this script, and have shared it at https://github.com/mherzog01/util/blob/main/sh/mk_bucket_dirs.sh.

This script assumes that you have mounted a GCS bucket locally on a Linux (or similar) system. The script first specifies the GCS bucket and location where the bucket is mounted. It then identifies all "directories" in the GCS bucket which are not visible locally, and creates them.

This (for me) fixed the issue with folders (and associated objects) not showing up in the mounted folder structure.

mherzog
  • 1,085
  • 1
  • 12
  • 24
  • If you are linking to your own script then please add a proper affiliation in your answer. Otherwise, it will be considered spam – Sabito stands with Ukraine Oct 27 '20 at 19:05
  • Just a link to your GitHub repo doesn't make for an answer on Stack Overflow. Answers must actually answer the question, without the requirement that the user click through to some other site to get the answer. Please [add context around links](//meta.stackoverflow.com/a/8259). **[Always quote](/help/referencing) the most relevant part of an important link, in case the target site is unreachable or goes permanently offline.** Take into account that being _barely more than a link to an external site_ is a reason as to [Why and how are some answers deleted?](/help/deleted-answers). – Makyen Oct 27 '20 at 19:16
  • Thank you for adding affiliation. However, to get the *real* answer (your script), one still has to go off-site. That might be reasonable, if the code required exceeded the capacity of an answer (then only major parts would need to be in the answer), but in this case, the script fits in an answer. In cases where I have something like this, I've both included the code in the answer and provided a link to it on GitHub, perhaps mentioning that the GitHub version is going to be the most current. As it is, this is still just an announcement that your script exists, rather than the *actual* answer. – Makyen Oct 30 '20 at 18:47