-1

I have a lot of files (aroung 50K) and need to store them on a specific folder for each day they were created. The creation day must be taken from its name which is something like:

normal_007a02ece6e249d2_940163493_210061_user_sector_23938_22-46-58_2019-12-10-00CA01DF-10270594-00000001.mp3

where there some extra informations but I have to move this to the folder 10, which is the 10th day of the month. So I have 31 folders (1-31) and must move each file to its respective day folder. Any clue how could I do this?

Edit: This is what I am trying:

pattern ='(normal_[A-Za-z0-9].*_[A-Z].*_[0-9]{5}_[0-9]*-[0-9]*-[0-9]*_([0-9]*)-([0-9]*)-([0-9]*).*)'

    for file in *.mp3
    do
       # echo $file

      if [[ $file =~ $pattern ]]; then
            echo "Match ${BASH_REMATCH[0]}"
    fi

Thanks in advance

  • This might help: [How can I just extract one underbar-separated field from a filename?](https://stackoverflow.com/q/60346648/3776858) – Cyrus Feb 21 '20 at 22:18
  • @Cyrus The title of that questions seems relevant, but the answer isn't. – Barmar Feb 21 '20 at 22:23
  • No...the size of the values for user, sector are not fix, user could be "jhon.doper, jhon.doe, jenny.alva, etc" and sector could vary as well...I have this RE that matches the file normal_[A-Za-z0-9].*_[A-Z].*_[0-9]{5}_[0-9]*-[0-9]*-[0-9]*_[0-9]*-[0-9]*-[0-9]*.mp3 but can't figure out how to get the parameter for a lot of files – Jorge Cornejo Bellido Feb 21 '20 at 23:16
  • Why is my question being negative? What is wrong with it? Understand that it is not a fix length – Jorge Cornejo Bellido Feb 21 '20 at 23:24
  • Thanks, edited my original post – Jorge Cornejo Bellido Feb 21 '20 at 23:48
  • 1
    Pluse-uno for improving your Q. ... if you can be sure that the only `-` chars in your filename are in the `YYYY-MM-DD` portion, then just `echo "$fName" |sed 's/^.*\([1-2][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]\).*$/\1/'` should get yout the date part. Do with that what you need. Organizing data as described in your header `dd_mm_yyy` is a headache in the making. Your example date is a much better format for long term storage of data. Everything sorts naturally and you will not need to reparse filenames to figure out where you are. Good luck! – shellter Feb 22 '20 at 00:51
  • Based on shellter's suggestion, would you please try: `pattern='[1-2][0-9][0-9][0-9]-[0-1][0-9]-([0-3][0-9])'`. Then the `DD` field is captured in bash variable `${BASH_REMATCH[1]}`. Use `${BASH_REMATCH[1]#0}` to suppress the leading `0`. – tshiono Feb 22 '20 at 04:36
  • Thanks, but as you can see the hyphen (-) is not only part of the date section, you can see it also at the hh-mm-ss_yyyy-mm-dd-code1-code2-code3 – Jorge Cornejo Bellido Feb 22 '20 at 11:53
  • 1
    @JorgeCornejoBellido : Yes, I see now that you data does have other `-`, but unless you have two (or more) date strings embedded in your filename, the restrictions for month-day (`[01]-[03][0-9]`) will return the correct value. I tested it now with the sample you provided and it returned only the YYYY-MM-DD value. I was also tryingto point out that trying to create a perfect reg-ex when you have complex rules to match is troublesome and often it is easier to accomplish your goal by just focusing on the value you do need. If that doesn't work for you, I understand. Good luck! – shellter Feb 22 '20 at 15:41

2 Answers2

1

You pattern is wrong. For the sample you gave, this seems to match, but I don't have enough data to make sure:

pattern='normal_[[:xdigit:]]{16}_[[:digit:]]{9}_[[:digit:]]{6}_[^_]*_[^_]*_[[:digit:]]*_[[:digit:]]*-[[:digit:]]*-[[:digit:]]*_([[:digit:]]{4})-([[:digit:]]{2})-([[:digit:]]{2})-[[:xdigit:]]{8}-[[:xdigit:]]{8}-[[:xdigit:]]{8}.mp3'
for file in *.mp3; do [[ $file =~ $pattern ]] && echo ${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/${BASH_REMATCH[3]}; done
2019/12/10
2019/12/13

However, my approach for this would be different. use find to find the files and generate the date sequentialy.

for ((i=1;i<=31;i++)) ;
   do 
     DATE=$(date -d "2019-11-30 +$i days" +%Y-%m-%d); 
     find -regextype posix-egrep \
          -iregex '.*normal_[[:xdigit:]]{16}_[[:digit:]]{9}_.*'$DATE'-[[:xdigit:]]{8}-[[:xdigit:]]{8}-[[:xdigit:]]{8}.mp3' 
          -exec echo mv --target-directory=/some/absolute/path/${DATE//-/\/}/ {} +; done


mv --target-directory=/some/absolute/path/2019/12/10/ ./normal_007a02ece6e249d2_940163493_210061_user_sector_23938_22-46-58_2019-12-10-00CA01DF-10270594-00000001.mp3
mv --target-directory=/some/absolute/path/2019/12/13/ ./normal_007a02ece6e249d2_940163493_210061_user_sector_23938_22-46-58_2019-12-13-00CA01DF-10270594-00000001.mp3

Short explanation:

  • this will search for files matching the regular expression with the $DATE included hardcoded (adjust your starting date and max value for different ranges)
  • using -exec {} + it will move batches of files into a directory (for simplicity is the same date)
  • remove the echo when you are really sure that the results are OK.
  • check again the regex

Edit: if you want to have a hierarchical structure (year/mm/dd) you can either use pattern substitution for date (${DATE//-/\/} - replaces all dashes with /), or use $i directly, and limit yourself to a month.

Another approach would be to use -mtime/-ctime instead of the actual date.

Sorin
  • 5,201
  • 2
  • 18
  • 45
  • Thanks! My goal was just to get the day (1-31) not the date as yyyy-mm-dd, I adjusted the pattern to pattern='normal_[[:xdigit:]]{16}_[[:digit:]]{9}_[[:digit:]]{6}_[^_]*_[^_]*_[[:digit:]]*_[[:digit:]]*-[[:digit:]]*-[[:digit:]]*_[[:xdigit:]]{4}-[[:xdigit:]]{2}-(.*)-[[:xdigit:]]{8}-[[:xdigit:]]{8}-[[:xdigit:]]{8}.mp3' – Jorge Cornejo Bellido Feb 22 '20 at 12:02
  • I needed the day, because these files are already with a folder structure of yyyy > mm but that is giving performance issues, so now we will break also into days yyyy > mm > dd subfolders. – Jorge Cornejo Bellido Feb 22 '20 at 12:04
0

To extract a string of the form dddd-dd-dd where d is a digit you may use GNU grep. Then you can read the date into y, m and d variables:

str=$(grep -Po "\d{4}-\d{2}-\d{2}" <(echo normal_2019-12-10-00.mp3))
read y m d < <(date -d $str "+%Y %m %d")
echo $y.$m.$d
2019.12.10

Extracting a date like yyyy-mm-dd from a string requires a more complex regex as explained in "Regex to validate date format dd/mm/yyyy"

builder-7000
  • 7,131
  • 3
  • 19
  • 43