0

Background: I'm uploading a csv file and a mapping file to my server using FormData and parsing using Papa Parse.

For some reason, Papa Parse's outputted object (which renders correctly using console.log) cannot be indexed by normal strings. I've even tried using JSON.parse(JSON.stringify(...)) on both my string and the object to see if I could normalize it somehow.


import Papa from 'papaparse'
import formidable from 'formidable'
import fs from 'fs'

...

const { files, fields } = await parseRequestForm(req)


let parsedMapping: Record<string, string = JSON.parse(fields.mapping as string)

const f = files.file as formidable.File

const output = await new Promise<{ loadedCount: number; totalCount: number }>(
  (resolve) => {
    const filecontent = fs.createReadStream(f.path)
    filecontent.setEncoding('utf8')

    let loadedCount = 0
    let totalCount = 0

    Papa.parse<Record<string, any>>(filecontent, {
      header: true,
      skipEmptyLines: true,
      dynamicTyping: true,
      chunkSize: 25,
      encoding: 'utf8',

      chunk: async (out) => {
        const data = out.data.map((r) => applyMapping(r, parsedMapping))

        totalCount += data.length

        try {
          await prisma.softLead.createMany({ data }).then((x) => {
            loadedCount += x.count
          })
        } catch (e) { }
      },

      complete: () => resolve({ loadedCount, totalCount }),
    })
  }
)


type ParsedForm = {
  error: Error | string
  fields: formidable.Fields
  files: formidable.Files
}

function parseRequestForm(req: NextApiRequest): Promise<ParsedForm> {
  const form = formidable({ encoding: 'utf8' })

  return new Promise((resolve, reject) => {
    form.parse(req, (err, fields, files) => {
      if (err) reject({ err })

      resolve({ error: err, fields, files })
    })
  })
}

function applyMapping(
  data: Record<string, any>,
  mapping: Record<keyof SoftLead, string>
): Partial<SoftLead> {
  return Object.fromEntries(
    Object.entries(mapping).map(([leadField, csvField]) => {

      // Struggling to access field here

      console.log('Field', `"${csvField}"`)
      console.log('Data', data)

      const parsed = JSON.parse(JSON.stringify(data))

      console.log(Buffer.from(Object.keys(parsed)[0]))
      console.log(Buffer.from(Buffer.from(csvField).toString('utf8')))
      
      console.log(parsed[csvField]) // undefined

      return [leadField, data[csvField]]
    })
  )
}

The Buffer lines are also indicating that the strings are not the same, even though they print the same to the console.

Papaparse's Index

  • Buffer.from(Object.keys(parsed)[0]) => <Buffer ef bb bf 45 6d 61 69 6c 73>

Map object key

  • Buffer.from(Buffer.from(csvField).toString('utf8')) => <Buffer 45 6d 61 69 6c 73>

A normal string

  • Buffer.from(Buffer.from('Emails').toString('utf-8')) => <Buffer 45 6d 61 69 6c 73>

Updates

  • 1: I also tried to set the encoding to utf16le but I think its failing altogether to parse because FormData apparently exclusively does utf8
Jack
  • 955
  • 1
  • 9
  • 30
  • `ef bb bf` might be the Byte Order Mark - something to look into – Code Slinger Aug 25 '21 at 16:42
  • @CodeSlinger Interesting, never even heard of that until now. Is that something I can manually clear? Should it have an affect on being able to index my object? – Jack Aug 25 '21 at 16:47

2 Answers2

1

ef bb bf might be the Byte Order Mark, which is illegal in JSON (JSON Specification and usage of BOM/charset-encoding).

If your string has a BOM you could try clearing it before passing to json.parse or json.stringify with Replace("\u00EF\u00BB\u00BF", null).

Code Slinger
  • 1,100
  • 1
  • 11
  • 16
  • Hm seems to have no effect on the string when running `.replace(new RegExp('\u00EF\u00BB\u00BF', 'g'), '')` or simply `.replace('\u00EF\u00BB\u00BF', '')` – Jack Aug 25 '21 at 17:00
1

I was able to solve this problem by stripping the BOM as described here. Simply,

const parsed = Object.fromEntries(
  Object.entries(data).map(([k, v]) => [stripBom(k), v])
)

export default function stripBom(str: string) {
  if (str.charCodeAt(0) === 0xfeff) {
    return str.slice(1)
  }

  return str
}
Jack
  • 955
  • 1
  • 9
  • 30