1

I have a script that was very kindly provided for me a while ago which allowed me to generate input files by inserting coordinates from a series of .xyz files into a template file (Create new files by copying contents of coordinate files into template file).

I'm trying to adapt that script to do something very similar, but different in a very slight, but annoying way. In the script, the new directories created to house these new files are named like this:

    # File name is in the form '....Hnnn.xyz';
    # this will parse nnn from that name.
    local inputNumber=$coordFile
    # Remove '.xyz'.
    inputNumber=${inputNumber%.xyz}
    # Remove everything up to and including the 'H'.
    inputNumber=${inputNumber##*H}

    # Subdirectory name is based on the input number.
    local outDir=$baseDir/D$inputNumber
    # Create the directory if it doesn't exist.
    if [[ ! -d $outDir ]]; then
        mkdir $outDir
    fi

This worked for my last problem, because the files were all named in the form xxxx_DH000.xyz. However, now the files I have are named using the form xxxx.000.xyz. While everything else in the script works, I cannot figure out how to name the new directories in the form 000.

The line in the script which I think needs to be edited slightly is where it says inputNumber=${inputNumber##*H}. What I cannot figure out is how to get the script to delete everything up to but not including a 0. I've searched online, but the only questions/answers I've found relating to the renaming of files by stripping part of the original names speaks about deleting everything 'up to and including' a string.

I was able to generate directories named 1, 2, 3, etc. with inputNumber=${inputNumber##*0}, however I want all three digits present (i.e. I would like create directories 001, 002, 003, etc.).

As an aside, I cannot use the . as the cutoff point, as there are multiple .s in each file name. An example of one of the file names is tma.h2s-2-pes-b97m-d4-tz.011.xyz.

Is there some way to get the script to simply name the files based on the full three digit number?

  • Can you show what the expected output for `xxxx.000.xyz` is? Also, after removing `.xyz`, are there still more dots in the string? It still pretty easy to remove everything up to the first/last dot. – choroba Jan 11 '23 at 08:47
  • Your question can't be about [tag:bash] and [tag:zsh] at the same time. I removed the Bash tag. – tripleee Jan 11 '23 at 09:34
  • 1
    `inputNumber=${inputNumber##*.}` should work fine, but, if you're getting the filenames from a glob expansion, then `zsh` might have a way to do it directly in the glob expression. – Fravadona Jan 11 '23 at 11:19
  • As an alternative to Fravadona's approach: If you know that the xxxx part can **not** contain digits, you could also do `[[ $coordFile =~ [[:digit:]]+ ]] && inputNumber=$MATCH`. This makes the step for removing `.xyz` unnecessary. But for getting a good answer to your question, you would have to define exactly how the content of `coordFile` can look like. – user1934428 Jan 11 '23 at 11:25
  • @tripleee Sorry, I thought they were mostly equivalent, and I don't mind using either bash or zsh (csh is totally foreign to me, though...) – isolated matrix Jan 12 '23 at 04:33
  • @Fravadona Your solution worked! I completely forgot that the .xyz had been stripped, and that the last full stop was actually the one before the numbers. Thank you so much! While I already have my solution, @user1934428 and @choroba, an example of the file name I in question is `tma.h2s-2-pes-b97m-d4-tz.011.xyz`, so a solution based on the numbers wouldn't work. – isolated matrix Jan 12 '23 at 04:41

1 Answers1

2

Although it's not needed in this case, zsh does support deleting text just before a matched pattern in a string. These parameter expansions will remove everything prior to the first 0 in the string, but keep the 0:

inputNumber='tma.h2s-2-pes-b97m-d4-tz.011.xyz'
inputNumber=${inputNumber:r} # remove '.xyz'
inputNumber=${(SM)inputNumber##0*}
print ${inputNumber}
# ==> 011

This includes a few zsh-isms:

  • ${...:r} returns the 'root' of a filename, removing the extension.
  • (S) - parameter expansion flag to change the behavior of the ## expansion. It will now search for patterns in the middle of a string, not just at the beginning.
  • (M) - flag to include the pattern match (the 0*) in the result.

This depends on the number always starting with 0, which may not be a good choice - what file comes after 099?


This next version uses a zsh extended glob pattern to find a number between two periods, and returns that number - i.e. it will find the number in .11., .011., or .2345., but not in .x11.:

coordFile='tma.h2s-2-pes-b97m-d4-tz.022.xyz'
inputNumber=${(*)coordFile//(#b)*.(<->).*/${match}}
print ${inputNumber}
# ==> 022

Some of the pieces:

  • ${...//.../...} - substitution expansion.
  • (*) - enables extendedglob for this expansion.
  • (#b) - globbing flag to enable 'backreferences', so that $match will work.
  • <-> - matches a number. This can be restricted to a range if needed, like <100-199>.
  • (<->) - puts the number into a match group.
  • *. and .* - everything before and after the number; these are not in the match group.
  • ${match} - the matched string from the parenthesized part of the pattern. This is used as the replacement for the entire string, so we get just the number. If more than one part of the input string matches the pattern, this will be the last one. match is actually an array, but since there's only one match group in the pattern, it does not need to be indexed with ${match[1]}.

This variant uses a standard regular expression to find the number:

coordFile='tma.h2s-2-pes-b97m-d4-tz.033.xyz'
match=
[[ $coordFile =~ .*\\.([[:digit:]]+)\\..* ]]
inputNumber=${match[1]}
print ${inputNumber}
# ==> 033

After the [[ ]] test, the match array will contain matches from any parenthesized groups in the regular expression - here, that will be a set of one or more digits in between two periods / full stops.


But, as @choroba and Fravadona have noted, since the number will be always be at the end of the string, you can use the standard #/##/%/%% expansions to remove parts of the string based only on the .s. This is a common idiom that will be familiar to many shell programmers, and will also work in bash (note that other parts of your original script depend on zsh).

inputNumber='tma.h2s-2-pes-b97m-d4-tz.044.xyz'
inputNumber=${inputNumber%.xyz}
inputNumber=${inputNumber##*.}
print ${inputNumber}
# ==> 044

In zsh everything can be consolidated into a single nested substitution:

baseDir='files/are/here'
coordFile='tma.h2s-2-pes-b97m-d4-tz.055.xyz'
local outDir=$baseDir/D${${coordFile:r}##*.}
print $outDir
# ==> files/are/here/D055
Gairfowl
  • 2,226
  • 6
  • 9