-3

string(42) converts integer constant 42 to an array of bytes, of length 1 where first element of array has 00101010

package main

import "fmt"

func main() {
    s := string(42)
    fmt.Printf("%d\n", len(s)) // 1
    fmt.Printf("%b\n", s[0]) // 101010 looks good
}

But,

Below code is taking the valid integer constant,

package main

import "fmt"

func main() {
    s := string(1024)
    fmt.Printf("%d\n", len(s)) // 2

    fmt.Printf("%b %b\n", s[0], s[1]) // 11010000 10000000 this looks wrong representation, it should be 00000100 00000000
}

Below code is taking the valid integer constant,

package main

import "fmt"

func main() {
    s := string(4254353345467546745674564564567445674658647567567853467867568756756785786785676858878978978978978907978977896789676786789655289890980889098835432453455544)
    fmt.Printf("%d\n", len(s))
    fmt.Printf("%d %d %d", s[0], s[1], s[2])
}

and converting it to array of bytes, of size 3. 239 191 189

but this is the not the right representation of this integer constant. It should be more than 3 bytes.


How to retrieve the bytes for the given integer constant?

Zombo
  • 1
  • 62
  • 391
  • 407
overexchange
  • 15,768
  • 30
  • 152
  • 347
  • 2
    The [conversion](https://golang.org/ref/spec#Conversions_to_and_from_a_string_type) used in the question converts a rune to the UTF-8 representation of the rune. Invalid Unicode code points are converted to \uFFFD as you observed. – Charlie Tumahai Sep 26 '20 at 15:59
  • @MuffinTop `1024` is a integer constant. I think, `string(1024)` is considering `1024` as unicode code point and encoding the unicode code point with UTF-8 as `11010000 10000000`. Correct me – overexchange Sep 26 '20 at 16:08
  • 3
    Yes, the bytes `11010000 10000000` are the UTF-8 representation of the Unicode code point 1024. – Charlie Tumahai Sep 26 '20 at 16:37
  • 1
    "string(42) convert interger constant 42 to an array of bytes". No. `string(42)` converts the untyped constant 42 to a string. A string and an array of bytes are different types in Go. – Volker Sep 26 '20 at 17:37
  • @thwd: Oh yes, of course. – Jonathan Hall Sep 26 '20 at 22:31
  • @MuffinTop So, `string(1024)` is converted to `string([]byte{208, 128})` which is `Ѐ`. Correct me – overexchange Sep 30 '20 at 05:35
  • @overexchange see https://play.golang.org/p/LeodvCBfNeW – Charlie Tumahai Sep 30 '20 at 11:24

2 Answers2

3

The main issue is that basic numeric types cannot handle a number this large. As a matter of fact, if you just try something like:

x := 4254353345467546745674564564567445674658647567567853467867568756756785786785676858878978978978978907978977896789676786789655289890980889098835432453455544

The build will fail with: 42543...5544 overflows int

To do what you're looking for, you need to do two things:

  • store your constant as something that can handle this size
  • use a data type that can handle large numbers

The easiest way to do this is using string for the first and big.Int for the second:

package main

import (
    "fmt"
    "math/big"
)

func main() {
    largeNum := "4254353345467546745674564564567445674658647567567853467867568756756785786785676858878978978978978907978977896789676786789655289890980889098835432453455544"
    i, ok := big.NewInt(0).SetString(largeNum, 10)
    if !ok {
        panic("big.Int SetString failed")
    }
    fmt.Println(i)
    fmt.Println(i.Bytes())
}

This will output both the base 10 string representation of the big int (the same thing you put in) and the bytes in big endian byte order:

4254353345467546745674564564567445674658647567567853467867568756756785786785676858878978978978978907978977896789676786789655289890980889098835432453455544
[81 58 216 146 57 48 179 246 202 93 83 128 121 181 65 161 52 211 183 127 131 99 227 65 100 227 35 171 8 45 246 240 131 6 183 2 149 204 10 62 88 195 78 51 233 238 225 162 144 75 54 210 134 17 37 22 20 217 213 213 67 96 62 184]
Marc
  • 19,394
  • 6
  • 47
  • 51
  • A `const` *can* represent a number [that large](https://play.golang.org/p/k3RTzQ8EMzX). It will of course [overflow](https://play.golang.org/p/Chrmpyeyorf) at runtime - as it does not fit into the largest Go integer type of int64. But the compiler can take two large `const` and make them resolve [at run time](https://play.golang.org/p/wJCSXu-EZGV). – colm.anseo Sep 26 '20 at 16:10
  • Because the compiler does the division for you and the result fits in a standard type. But do anything else with the raw constants (such as passing it to `big.Int`) and it will fail. – Marc Sep 26 '20 at 16:12
  • My comment covers that. Your answer still has the incorrect statement "Constant numeric types cannot represent a number this large". From the [docs](https://golang.org/ref/spec#Constants): "Numeric constants represent exact values of arbitrary precision and do not overflow." – colm.anseo Sep 26 '20 at 16:30
  • Fair enough, reworded. – Marc Sep 26 '20 at 16:46
1

Other answer is good, but I wanted to add a little more context about why this happened. When you provide a number to string(), Go interprets the input as a Unicode code point. Normally you would give hex input like below:

package main
import "fmt"

func main() {
   { // example 1
      s := string(0x1F600)
      t := fmt.Sprintf("%X", s)
      fmt.Println(t == "F09F9880")
   }
   { // example 2
      s := string(0x10FFFF + 1)
      t := fmt.Sprintf("%X", s)
      fmt.Println(t == "EFBFBD")
   }
}

UTF-8 maxes out at U+10FFFF, AKA 0x10FFFF, AKA 1114111 in decimal. So if you give anything over that (which you obviously did), then you get EFBFBD back, which is REPLACEMENT CHARACTER (U+FFFD).

Zombo
  • 1
  • 62
  • 391
  • 407