2

I am trying to create a file using windows.CreateFile() function (for reference please see https://godoc.org/golang.org/x/sys/windows#CreateFile and https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew ) in Golang 1.14. Beside the code works, I am clearly passing a wrong parameter for file Name attribute of CreateFile().

The code is:

package main

import (
    "unsafe"

    "golang.org/x/sys/windows"
)

func main() {
    var (
        nullHandle windows.Handle
        filename   string = "test_file"
    )

    strptr := &filename
    fileNamePtr := (*uint16)(unsafe.Pointer(strptr))
    dwShareMode := uint32(windows.FILE_SHARE_READ | windows.FILE_SHARE_WRITE | windows.FILE_SHARE_DELETE)
    dwFlagsAndAttributes := uint32(windows.FILE_FLAG_DELETE_ON_CLOSE)

    windows.CreateFile(fileNamePtr, windows.GENERIC_WRITE, dwShareMode, nil, windows.CREATE_NEW, dwFlagsAndAttributes, nullHandle)
}

and I am getting a file created with non-ascii chars (in this case 庡R)

Directory of C:\Users\rodrigo\src\delete_on_close

04/30/2020  03:15 PM    <DIR>          .
04/30/2020  03:15 PM    <DIR>          ..
04/30/2020  03:12 PM               715 main.go
04/30/2020  03:14 PM         2,698,240 __debug_bin
04/30/2020  03:15 PM                 0 庡R
               3 File(s)      2,698,955 bytes
...

Moreover this name varies in each run, so I think I am not properly pointing to my filename variable. Any idea? (thank you in advance)

Rodrigo
  • 93
  • 1
  • 8
  • 1
    Unless you have a *very very* good reason not to, you should create files with [os.Create](https://golang.org/pkg/os/#Create). `x/` packages are not included in the Go 1 Compatibility Guarantee, and `unsafe` is, well, unsafe. – Adrian Apr 30 '20 at 14:10

2 Answers2

6

The problem

var filename string = "test_file"
strptr := &filename
fileNamePtr := (*uint16)(unsafe.Pointer(strptr))

is incorrect on multiple levels:

  1. A string in Go is a struct-typed value containing two fields: a pointer to the first byte of the string's data and an integer containing the length of the string (in bytes)—basically it's defined like this:

    type string struct {
        ptr *byte
        len int
    }
    

    Hence taking the address of a Go's string variable is taking the address of the location in memory where that pointer to the string's data is contained (the ptr field above).

    To get the address of the first byte of the string's data, one would do &filename[0]. But this is still incorrect in your case—bear with me.

  2. Go strings contain opaque bytes.

    There are several places in Go which do assume a particular encoding of Go strings—namely, UTF-8, and that's what you would read in any tutorial material in Go,—but really they may contain opaque bytes, encoded using any encoding or no encoding at all.
    This means the way to re-encode a string to some target encoding must be decided on a case-by-case basis—taking into account the encoding of the source string.

    Luckily, your particular case is the simplest one.
    Since Go source code files are defined to be encoded in UTF-8, Go strings which were defined as string literals (and your filename variable gets assigned a value defined by a string literal) are encoded in UTF-8.

    UTF-8 is a variable-length encoding, which uses 1 to 4 bytes per encoded Unicode code point—depending on its integer value.

    The Win32 API function you intend to call wants a string encoded in UTF-16.
    UTF-16 is a fixed-length encoding which uses 2 bytes per Unicode code point it encodes.

    I think by now it should be obvious that making a "reinterpreting" cast of a pointer pointing at an UTF-8-encoded string to a pointer pointing to an UTF-16-encoded string won't do anything to the contents of that string: they will remain encoded in UTF-8.

The solution

So, you first need to do proper conversion: count the number of Unicode code points ("runes") contained in the source string, allocate twice as many bytes for the new string, then iterate over the runes in the source string one-by-one, properly encoding each into the destination string (Windows uses little-endian format for UTF-16).

While you may roll your own implementation as described above, Go already has it in its built-in syscall package in the form of the

func UTF16FromString(s string) ([]uint16, error)

function.

So your code should become something like

u16fname, err := syscall.UTF16FromString(filename)
if err != nil {
  // fail
}

windows.CreateFile(&u16fname[0], ...)

Note that you might see what's available in the syscall package by reading the output of go doc syscall.

If you're not on the target OS, run GOOS=windows go doc syscall.

And note that the https://golang.org/pkg/syscall renders the documentation for GOOS=linux, so it's useless to read when you'd like to use Windows-specific stdlib code.


If you're curious, in your case, when you've passed the address of a pointer value to CreateFileW, that function started to interpret the raw memory starting with the 1st byte of the 64-bit pointer value as four consecutive UTF-16-encoded characters then it proceeded to the length field of the string value which contained the value 0x0000000000000009 — the length of the string "test_file" in bytes, — so CreateFileW read the first 0x0009, interpreted it as a TAB character and then stopped at 0x0000 as it is an UTF-16-encoded NUL (which terminates strings in "wide" Win32 API).
It may also have managed to stop earlier—depending on the actual value of the pointer: if it had 0x0000 in its upper word, that value had served as a NUL-terminator.

kostix
  • 51,517
  • 14
  • 93
  • 176
  • 2
    @Rodrigo, I have fixed the code example: it should have been `windows.CreateFile(&u16fname[0], ...)` — so that you get the address of the first element of the resulting slice. Sorry about the confusion. (Without the `[0]` bit you'd get the address of the slice value itself, which is, much like with strings, is a `struct` type with a pointer and two integers.) – kostix Apr 30 '20 at 15:13
  • 1
    Note that UTF-16 encoding of certain Unicode runes requires two UTF-16 values. These are the [surrogate pairs](https://stackoverflow.com/questions/31986614/what-is-a-surrogate-pair). In this case we know that the file name itself consists of non-surrogate-pair runes, so 2 times (number-of-runes) is OK, and of course if you use someone else's debugged routines like `UTF16FromString` you get all that for free :) – torek Apr 30 '20 at 22:35
2

Referring to this...

In Windows, some procedures which take string arguments have two variants: one for ANSI-encoded, and one for UTF-16 encoded strings. Regardless of which you choose, neither of these string types are directly compatible with Go strings. In order to use them, you’ll need to construct compatible strings.

You may use something like this to convert Go strings to a null-terminated UTF-16 strings.

func StringToUTF16Ptr(str string) *uint16 {
    wchars := utf16.Encode([]rune(str + "\x00"))    
    return &wchars[0]
}

A word of caution (from "Go Proverbs" by Rob Pike)

With the unsafe package there are no guarantees.

Jay
  • 1,089
  • 12
  • 29