-1

I needed to move a large s3 bucket to a local file store for a variety of reasons, and the files were stored as 160,000 directories with subdirectories.

As this is just far too many folders to look at with something like a gui FTP interface, I'd like to move the 160,000 root directories into, say, 320 directories - 500 directories in each.

I'm a newbie at bash scripting, and I just wrote this up, but I'm scared I'm going to mangle the whole thing and have to redo the transfer. I tested with [[ "$i" -ge 3 ]]; and some directories with subdirectories and it looked like it worked okay, but I'm quite nervous. Do not want to retransfer all this data.

i=0;
j=0;

for file in *; do
  if [[ -d "$file" && ! -L "$file" ]];
    then
      ((i++))
      echo "directory $file is being written to assets_$j";
      mv $file ./assets_$j/;
      if [[ "$i" -ge 499 ]];
        then
          ((j++));
          ((i=0));
      fi
  fi;
done

Thanks for the help!

kvantour
  • 25,269
  • 4
  • 47
  • 72
Giallo
  • 785
  • 2
  • 10
  • 26
  • 1
    you question is not really clear. Can you give an example of what you want with, lets say 10 directories – kvantour Mar 19 '20 at 14:21
  • 1
    Maybe `rsync` can be of utility. If I were you, I would create a file with a list of folders to move, and then I would traverse that file and keep a record in another file of what work has been done. – Poshi Mar 19 '20 at 14:25
  • 1
    You could cp instead of mv. Then if things go awry, you can just restart it. In this case, cp shouldn't be much slower (other than the fact that you may want to rm * when you have a good local copy) – dstromberg Mar 19 '20 at 14:26
  • 1
    You should probably add a `[[ $file = assets_* ]] && continue` at the top of your loop so you don't try to rename `assets_*` directories that already exist. – Charles Duffy Mar 19 '20 at 15:10
  • 1
    And btw, `if (( i >= 499 ))` is the better bashism for comparing `i` to `499`. If you're going to use a non-POSIX-compliant syntax, might as well pick the more readable, built-to-purpose one. :) – Charles Duffy Mar 19 '20 at 15:10
  • 1
    @dstromberg, ...I'd suggest `ln` instead of `cp`; that way there's no duplication of storage (except for the inodes).. – Charles Duffy Mar 19 '20 at 15:11
  • @CharlesDuffy when I tested this with when i > 3, it didn't iterate over the new directories...but that's a good defensive piece of code to put in there. – Giallo Mar 19 '20 at 16:01
  • 1
    @Giallo, right; the glob just runs once before the loop is invoked at all, but the concern is what happens if you cancel and restart partway through, or want to incrementally move more files. – Charles Duffy Mar 19 '20 at 16:12
  • 1
    @Giallo, btw, one thing that's more important is changing `mv $file` to `mv "$file"`. Going to have a very bad day with filenames containing spaces or glob characters otherwise. (That's also an issue http://shellcheck.net/ would identify for you automatically). – Charles Duffy Mar 19 '20 at 16:13
  • @CharlesDuffy thanks for the help! – Giallo Mar 20 '20 at 15:45

3 Answers3

3
  • find all the directories in the current folder.
  • Read a count of the folders.
  • Exec mv for each chunk

find . -mindepth 1 -maxdepth 1 -type d |
while IFS= readarray -n10 -t files && ((${#files[@]})); do
     dest="./assets_$((j++))/"
     echo mkdir -v -p "$dest"
     echo mv -v "${files[@]}" "$dest";
done
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
3

On the condition that assets_1, assets_2, etc. do not exist in the working directory yet:

dirs=(./*/)
for (( i=0,j=1; i<${#dirs[@]}; i+=500,j++ )); do
    echo mkdir ./assets_$j/ 
    echo mv "${dirs[@]:i:500}" ./assets_$j/
done

If you're happy with the output, remove echos.

oguz ismail
  • 1
  • 16
  • 47
  • 69
  • 1
    I'd suggest `printf '%q ' mv "${dirs[@]:i:50}" ./assets_$j/; echo` -- the way it's done now will misrepresent what is actually run with names with spaces/globs/etc. – Charles Duffy Mar 19 '20 at 16:28
  • (And maybe add a `-p` to the `mkdir` so it doesn't complain when the directory already exists... well, either that, or loop on incrementing `j` until we get a directory that *doesn't* exist). – Charles Duffy Mar 19 '20 at 16:29
  • @Charles you're right, but that'd be confusing for OP – oguz ismail Mar 19 '20 at 16:31
  • @Charles if the directory already exists, this will make a mess. – oguz ismail Mar 19 '20 at 16:32
  • Not if you add a guard to skip files that are itself `assets_*` ones, and the increment loop to find a nonexistent target directory I described above. Which is to say, the mess in question is very preventable. – Charles Duffy Mar 19 '20 at 16:37
  • (Maybe better to replace `*` with `!(assets_*)`) – Charles Duffy Mar 19 '20 at 16:38
  • @Charles I'm not saying you're wrong, but these additions would make it too complicated. OP **mustn't** have assets_n dirs in . and everything's gon be fine – oguz ismail Mar 19 '20 at 17:09
  • 2
    I'm not saying you're wrong either -- the answer has my upvote; just kibitzing, it's what I do. :) – Charles Duffy Mar 19 '20 at 17:11
1

A possible way, but you have no control on the counter, is:

find . -type d -mindepth 1 -maxdepth 1 -print0 \
   | xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$@" ./assets_$$' _

This gets the counter of assets from the PID which only recycles when the wrap-around is reached (Linux PID recycling)

The order which findreturns is slight different then the glob * (Find command default sorting order)

If you want to have the sort order alphabetically, you can add a simple sort:

find . -type d -mindepth 1 -maxdepth 1 -print0 | sort -z \
   | xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$@" ./assets_$$' _

note: remove the echo if you are pleased with the output

kvantour
  • 25,269
  • 4
  • 47
  • 72