-3

I have a CSV file that uses a space as the delimiter. But some of the fields contain a space and those fields are wrapped with double quotes. Any field with a null/empty value is represented as "-". Fields that are not null/empty and do not contain spaces are not wrapped in double quotes. Here's an example of one row in the CSV file.

foobar "foo bar" "-" "-" "-" fizzbuzz "fizz buzz" fizz buzz

Also there are no headers for the CSV file. I was going to use a simple solution such as this one https://stackoverflow.com/a/20769342/3299397 but using strings.Split(csvInput, " ") wouldn't handle the spaces inside the fields. I've also looked into this library https://github.com/gocarina/gocsv but I'm curious if there's a solution that doesn't use a third-party library.

Kyle Bridenstine
  • 6,055
  • 11
  • 62
  • 100

1 Answers1

2

This is "plain" CSV format where the separator is the space character instead of comma or semicolon. The encoding/csv package can handle this.

As to your null / empty fields: just use a loop as a post-processing step and replace them with the empty string.

Using the input:

const input = `foobar "foo bar" "-" "-" "-" fizzbuzz "fizz buzz" fizz buzz
f2 "fo ba" "-" "-" "-" fd "f b" f b`

Parsing and post-processing it:

r := csv.NewReader(strings.NewReader(input))
r.Comma = ' '
records, err := r.ReadAll()
if err != nil {
    panic(err)
}
fmt.Printf("%#v\n", records)

for _, r := range records {
    for i, v := range r {
        if v == "-" {
            r[i] = ""
        }
    }
}
fmt.Printf("%#v\n", records)

Output (try it on the Go Playground):

[][]string{[]string{"foobar", "foo bar", "-", "-", "-", "fizzbuzz", "fizz buzz", "fizz", "buzz"}, []string{"f2", "fo ba", "-", "-", "-", "fd", "f b", "f", "b"}}
[][]string{[]string{"foobar", "foo bar", "", "", "", "fizzbuzz", "fizz buzz", "fizz", "buzz"}, []string{"f2", "fo ba", "", "", "", "fd", "f b", "f", "b"}}
icza
  • 389,944
  • 63
  • 907
  • 827
  • 1
    @KyleBridenstine The `encoding/csv` package states that it implements the CSV format described in [RFC4180](https://tools.ietf.org/html/rfc4180), which allows optional quoted fields. This is also explicitly mentioned in the package doc of `encoding/csv`: _"Fields which start and stop with the quote character " are called quoted-fields. The beginning and ending quote are not part of the field."_ – icza Jan 29 '20 at 19:42