0

I have a simple csv reader where i use to upload csv, do some manipulation on the data and print a new csv output.

Im using tototoshi csv library with Scala.

My problem is that my project knows to handle UTF-8 files, but now I need to support UTF-8-BOM file, if someone can explain me how do I solve this it will be great help.

This is the current func's that support UTF-8:

writer:

  //----------------WRITER----------------//
  class CsvDataWriter(csvFile: File, headers: List[String])(implicit format: CSVFormat) {
    val fos = new FileOutputStream(csvFile, false)
    private val writer = {
      CSVWriter.open(fos, "UTF-8")(format)
    }
    writer.writeRow(headers)

    def close() = {
      fos.close()
      writer.close()
    }

    def write(outputCSVRow: RowMap) = writer.writeRow(headers map outputCSVRow)
    def writeHeaders(headers: List[String]) = {
      writer.writeRow(headers)
    }
  }

reader:

  //----------------READER----------------//
  class CsvDataReader(csvFile: File) {

    private val reader = CSVReader.open(csvFile, "UTF-8")(Format)

    val headers: List[String] = reader.readNext().get

    def close() = reader.close()

    def iteratorWithHeaders: Iterator[Map[String, String]] = {
      reader.iterator.map(line => headers.zip(line).toMap)
    }
  }

and this is the upload func when a user select the file:

 def upload = Action(parse.multipartFormData) { implicit request =>
    request.body.file("file").fold {
      BadRequest("Missing file")
    } { uploadedFile => {

      val localFile = new File("/tmp/" + uploadedFile.ref.file.getName)

      Files.copy(uploadedFile.ref.file.toPath, localFile.toPath, StandardCopyOption.REPLACE_EXISTING)
      localFile.deleteOnExit()
      val j = Json.parse( s"""{"fileId": "${Crypto.encryptAES(localFile.getAbsolutePath)}"}""")

      Ok(j)
    }
    }
  }
JohnBigs
  • 2,691
  • 3
  • 31
  • 61
  • 1
    The Byte Orser Mark [is pointless in UTF-8](http://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom). If your librady doesn't support UTF-8-BOM, I would simply check for the BOM (in either byte order) and strip it from the front of your stream. – Jonathon Reinhart Jan 10 '17 at 13:15
  • @JonathonReinhart can you show me how can i do this? – JohnBigs Jan 10 '17 at 13:28
  • UTF-8 has only one byte order, network byte order which is big-endian. – zaph Jan 10 '17 at 13:29

1 Answers1

0

From SO answer

BOM is neither required nor recommended for UTF-8

According to the Unicode standard, the BOM for UTF-8 files is not recommended:

The UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

Community
  • 1
  • 1
zaph
  • 111,848
  • 21
  • 189
  • 228