Extracting substrings in Go

Question

I'm trying to read an entire line from the console (including whitespace), then process it. Using bufio.ReadString, the newline character is read together with the input, so I came up with the following code to trim the newline character:

input,_:=src.ReadString('\n')
inputFmt:=input[0:len(input)-2]+"" //Need to manually add end of string

Is there a more idiomatic way to do this? That is, is there already a library that takes care of the ending null byte when extracting substrings for you?

(Yes, I know there is already a way to read a line without the newline character in go readline -> string but I'm looking more for elegant string manipulation.)

Denys Séguret · Accepted Answer · 2019-02-01T14:34:38.413

231

It looks like you're confused by the working of slices and the string storage format, which is different from what you have in C.

any slice in Go stores the length (in bytes), so you don't have to care about the cost of the len operation : there is no need to count
Go strings aren't null terminated, so you don't have to remove a null byte, and you don't have to add 1 after slicing by adding an empty string.

To remove the last char (if it's a one byte char), simply do

inputFmt:=input[:len(input)-1]

edited Feb 01 '19 at 14:34

answered Sep 07 '12 at 07:39

Denys Séguret

372,613
87
782
758

22

You don't even need the 0 (or the :), `s = s[:len(s)-1]` will do. – uriel Sep 07 '12 at 15:06
19

Please note that this method will not work with Unicode strings! https://groups.google.com/forum/#!msg/golang-nuts/ZeYei0IWrLg/PfPnAy_TVsMJ – Melllvar Aug 24 '13 at 00:02
2

@Melllvar That's why I precised *"if it's a one byte char"*. If you want to remove a char taking more than one byte (that's not OP's case), you have to adapt. – Denys Séguret Aug 24 '13 at 07:16
1

This answer is misleading. This is correct answer: https://stackoverflow.com/a/56129336/1035334 – FiftiN Nov 30 '20 at 11:01
3

@FiftiN this answer works for most cases even UTF-8 as long as you don't see it as the character index. For example UTF-8 is designed in a way where you can search for substrings by byte comparison and if you for example search for something and then use that to get a substring everything works out despite it being not explicitly aware of UTF-8 because the people who made UTF-8 were smart enough – Jimmy T. Dec 26 '20 at 00:46

joonas.fi · Answer 2 · 2021-01-03T11:57:54.580

89

WARNING: operating on strings alone will only work with ASCII and will count wrong when input is a non-ASCII UTF-8 encoded character, and will probably even corrupt characters since it cuts multibyte chars mid-sequence.

Here's a UTF-8-aware version:

// NOTE: this isn't multi-Unicode-codepoint aware, like specifying skintone or
//       gender of an emoji: https://unicode.org/emoji/charts/full-emoji-modifiers.html
func substr(input string, start int, length int) string {
    asRunes := []rune(input)
    
    if start >= len(asRunes) {
        return ""
    }
    
    if start+length > len(asRunes) {
        length = len(asRunes) - start
    }
    
    return string(asRunes[start : start+length])
}

edited Jan 03 '21 at 11:57

answered May 14 '19 at 11:27

joonas.fi

7,478
2
29
17

9

This needs way more upvotes - I just got bitten badly by not using utf-8 aware splitting. – kolaente Jul 07 '20 at 07:14
2

This doesn't really work either, there are cases where only a sequence of runes make up a symbol, for example the emoji for country flags are represented by two runes. Do you really want to cut that in two? I guess there's a reason why such a function isn't given by default: There's just no universal correct way to split unicode strings without context. Other than that, if you're using substring based on indexes you've gotten by searching of substrings you don't need to use a UTF-8 aware substring function; UTF-8 was designed in a way that even works with functions made for binary – Jimmy T. Dec 29 '20 at 23:30
@JimmyT. great point that you need to decide what exactly you want to support, but I'd rather say that splitting by runes is much better than splitting by bytes, though still not perfect. But you have a good point: in this case people usually want to extract N amount of visible symbols (hence substr(): strings are not meant for bytes, but for visible characters). Usually a visible symbol is a single Unicode code point, but you make a great point saying that one can combine multiple code points into one visible symbol, and my code doesn't account for that. I added a warning to my code, thanks! – joonas.fi Jan 03 '21 at 11:54
Isn't there some standart function for this? In `strings` package, for example? – nikserg Jan 19 '22 at 07:00
@joonas.fi It's worse than just that one can combine multiple code points into one visible symbol; different platforms/fonts will combine them differently, either merging with the ZWJ character, or showing two glyphs. Eg, the bytes 0xf09f8fb3efb88fe2808df09f8c88 might be shown as either or ️‍, and both are "valid" interpretations of those bytes as UTF8. There's really no way to know what a "glyph" will be unless you handle strings differently based on where the client might be rendering them. Since the OP is asking about removing trailing \n, of course, this is all a bit academic :) – isaacs Mar 03 '22 at 21:57

score 31 · Answer 3 · answered Sep 07 '12 at 15:10

31

Go strings are not null terminated, and to remove the last char of a string you can simply do:

s = s[:len(s)-1]

answered Sep 07 '12 at 15:10

uriel

1,467
13
14

20

This is incorrect and will cause bugs. This strips the last *byte* off the string, which may render it invalid UTF-8 (or other multibyte encoding). – dr. Sybren Oct 24 '17 at 10:53
6

See https://play.golang.org/p/K3HBBtj4Oi for an example of how this breaks. – dr. Sybren Oct 24 '17 at 11:04

score 18 · Answer 4 · edited Feb 25 '23 at 14:45

18

This is the simple one to perform substring in Go

package main

import "fmt"

func main() {

  value := "address;bar"

  // Take substring from index 2 to length of string
  substring := value[2:len(value)]
  fmt.Println(substring)

}

edited Feb 25 '23 at 14:45

Telegrapher

330
4
11

answered May 23 '18 at 07:37

Faris Rayhan

4,500
1
22
19

score 11 · Answer 5 · answered Feb 05 '15 at 18:37

11

To avoid a panic on a zero length input, wrap the truncate operation in an if

input, _ := src.ReadString('\n')
var inputFmt string
if len(input) > 0 {
    inputFmt = input[:len(input)-1]
}
// Do something with inputFmt

answered Feb 05 '15 at 18:37

Rohanthewiz

947
9
9

score 3 · Answer 6 · answered Nov 03 '15 at 08:51

3

To get substring

find position of "sp"
cut string with array-logical

https://play.golang.org/p/0Redd_qiZM

answered Nov 03 '15 at 08:51

TeeTracker

7,064
8
40
46

score 3 · Answer 7 · answered Jan 30 '20 at 08:37

8 years later I stumbled upon this gem, and yet I don't believe OP's original question was really answered:

so I came up with the following code to trim the newline character

While the bufio.Reader type supports a ReadLine() method which both removes \r\n and \n it is meant as a low-level function which is awkward to use because repeated checks are necessary.

IMO an idiomatic way to remove whitespace is to use Golang's strings library:

input, _ = src.ReadString('\n')

// more specific to the problem of trailing newlines
actual = strings.TrimRight(input, "\r\n")

// or if you don't mind to trim leading and trailing whitespaces 
actual := strings.TrimSpace(input)

See this example in action in the Golang playground: https://play.golang.org/p/HrOWH0kl3Ww

Isn't the question about "extracting substring", not about trim? — Espresso, Nov 02 '20 at 19:43
Folloing this post's title: yes, but following OP's actual question: no. You might want to read OP's full text in which he specifically wants to get rid of newlines via trimming: "so I came up with the following code to trim the newline character" — Philipp Pixel, Nov 07 '20 at 03:12

Thushara Buddhika · Answer 8 · 2021-09-07T05:58:18.717

Hope this function will be helpful for someone,

str := "Error 1062: Duplicate entry 'user@email.com' for key 'users.email'"
getViolatedValue(str)

This is used to substring that used ' in the main string

func getViolatedValue(msg string) string {
    i := strings.Index(msg, "'")

    if i > -1 {
        part := msg[i+1:]
        j := strings.Index(part, "'")
        if j > -1 {
            return part[:j]
        }
        return ""
    } else {
        return ""
    }
}

Extracting substrings in Go

8 Answers8

Linked