2

Is it possible to create a truly unique directory name (i.e. based on uuid) that is shorter then the default guid format?

so far I've been able to come up with this:

function Get-UglyButShortUniqueDirname {
  [CmdletBinding()]
  param ()

  $t = "$([System.Guid]::NewGuid())".Replace("-", "")
  Write-Verbose "base guid: $t"
  $t = "$(0..$t.Length | % { if (($_ -lt $t.Length) -and !($_%2)) { [char][byte]"0x$($t[$_])$($t[$_+1])" } })".replace(" ", "").Trim()
  Write-Verbose "guid as ascii: $t"
 ([System.IO.Path]::GetInvalidFileNameChars() | % { $t = $t.replace($_, '.') })
  Write-Verbose "dirname: $t"
  $t
}

With this I can generate directory names that look weird but take only about ~16 characters, which is way better than the default 32 characters of a plain guid (without dashes).

The thing I'm a bit concerned about: as 'invalid file name characters' are stripped and replaced with dots, those identifiers do not hold up to the same "uniqueness promise" as a guid does.

(struggling with legacy 260 char path-name limitations in Win-based automation environments :-/)

mwallner
  • 985
  • 11
  • 23
  • 2
    A man can always `[System.IO.Path]::GetFileNameWithoutExtension([System.IO.Path]::GetRandomFileName())` – Santiago Squarzon Jan 26 '22 at 15:02
  • Actually, I need that folder name to be based on a UUID (external input), as it also needs to be unique across multiple hosts – mwallner Jan 26 '22 at 15:39
  • As a side note: as you are already using unicode, what about using the [extended-length path](https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation) using the ``\\?\`` prefix. For example, `\\?\D:\very long path`. – iRon Jan 26 '22 at 16:09
  • //?/ Is fine for some things, but based on my experience it will most likely will not work when dealing with legacy apps that use MAX_PATH and similar limitations for input argument buffers. – mwallner Jan 27 '22 at 07:16

2 Answers2

4

I would Base32-encode the GUID. With 26 characters the result will be slightly longer than Base64, but you won't lose bits of randomness on case-insensitive file systems. Also it uses only basic alphanumeric characters, which IMO looks better ;-).

Unfortunately there is no built-in Base32 encoder in .NET. First I've adopted the encoding part of this C# answer but then I got fancy and modified it to use the z-base32 variant, which is easier on the human eye and saves a few characters by not using padding.

Add-Type -TypeDefinition @'
public class ZBase32Encoder {
    private const string Charset = "ybndrfg8ejkmcpqxot1uwisza345h769";

    public static string ToString(byte[] input) {
        if (input == null || input.Length == 0) {
            throw new System.ArgumentNullException("input");
        }

        long returnLength = (input.Length * 8L - 1) / 5 + 1;
        if( returnLength > (long) System.Int32.MaxValue ) {
            throw new System.ArgumentException("Argument length is too large. (Parameter 'input')");
        }
        
        char[] returnArray = new char[returnLength];

        byte nextChar = 0, bitsRemaining = 5;
        int arrayIndex = 0;

        foreach (byte b in input) {
            nextChar = (byte)(nextChar | (b >> (8 - bitsRemaining)));
            returnArray[arrayIndex++] = Charset[nextChar];
            
            if (bitsRemaining < 4) {
                nextChar = (byte)((b >> (3 - bitsRemaining)) & 31);
                returnArray[arrayIndex++] = Charset[nextChar];
                bitsRemaining += 5;
            }
            
            bitsRemaining -= 3;
            nextChar = (byte)((b << bitsRemaining) & 31);
        }

        //if we didn't end with a full char
        if (arrayIndex != returnLength) {
            returnArray[arrayIndex++] = Charset[nextChar];
        }

        return new string(returnArray);
    }
}
'@

foreach( $i in 1..5 ) {
    $s = [ZBase32Encoder]::ToString($(New-Guid).ToByteArray())
    "$s (Length: $($s.Length))"
}

Output:

81u68ug6txxwpbqz4znzgq3hfa (Length: 26)
sseik38xykrr5n99zedj96nsoy (Length: 26)
a353cgcyhefwdc8euk34zbytxa (Length: 26)
e3x8zd576zcrzn3nyxwxncenho (Length: 26)
7idr4xencm9rmp8wkzidk1fyhe (Length: 26)
zett42
  • 25,437
  • 3
  • 35
  • 72
  • 1
    nice implementation, yet there's an extra `\u0000\u0000\u0000\u0000\u0000\u0000` attached to each generated string - which makes it hard to use as-is – mwallner Jan 27 '22 at 11:07
  • @mwallner Good catch, I've fixed the output string length calculation. – zett42 Jan 27 '22 at 14:17
2

Convert your quid to Base64 which gives you a 24 characters string and (as mentioned by zett42) it is required to replace the possible slash (/). besides, you might save another two characters by removing the unnecessary padding:

[System.Convert]::ToBase64String((NewGuid).ToByteArray()).SubString(0,22).Replace('/', '-')
zp92wiHcdU+0Eb9Cw2z0VA

BUT, there is actually a flaw in this idea: folder names are case insensitive, meaning that the folder naming might not be as unique as the original guid.
Therefore you might want to fall back on Base32 (which needs 26 characters), which is a little more complex as there is no standard .Net method for this:

$Chars = ('A'..'Z') + ('2'..'7')
$Bytes = (New-Guid).ToByteArray()
$Bits  = -join $Bytes.ForEach{ [Convert]::ToString($_, 2).PadLeft(8, '0') }
-Join ($Bits -Split '(?<=\G.{5})').foreach{ $Chars[[Convert]::ToInt32($_, 2)] }
DZ77OUQNDRQUTGP5ATAM7KCWCB

You might do something similar to include special characters, but I would be very careful about that as not every file system might support that.

iRon
  • 20,463
  • 10
  • 53
  • 79
  • love it, works just as expected. thanks! – mwallner Jan 27 '22 at 11:08
  • The RegEx Base32 converter is interesting, but produces a string that is 2 characters too long. – zett42 Jan 27 '22 at 14:22
  • 1
    @zett42, I noticed the two character difference yesterday but hadn't the time to investigate. Anyways, it was caused by the fact that the number of `$Bits` was not a multiple of `5`. I have fixed that in the answer. The size of the string is now also 26 characters. The characters it self are still different between our solution but is probably due to the left/right padding and the character set use. – iRon Jan 27 '22 at 16:22
  • 1
    I have simplified the conversion commands. Note the resulted 26 character string should still be unique but is not compatible with previous versions. – iRon Jan 27 '22 at 17:53