0

I have some code that takes a response from an API and splits it into paragraphs by regex line breaks:

choppedString = await mainResponse.split(/\n\s*\n/);

But sometimes this returns a very long paragraph, and I can't push a Discord.JS embed field thats longer than 1024 characters.

This is where I'm stuck. I can't figure out how to split a paragraph (a .split() array elem) that is longer than 1024 characters and split it up every 5 sentences. Any help?

MaddieX
  • 22
  • 3

1 Answers1

0

I don't know if this is the best/most efficient way to do this, but it works:

const mainResponse = `A short paragraph with less than 5 sentences. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

A longer paragraph over 1024 characters. A sentence ending in a question mark should still work? And another ending with an exclamation mark! A sentence ending with a new line
Sed ac tempor velit. Mauris accumsan sollicitudin enim, a blandit metus blandit at. Aenean metus nulla, faucibus et mattis ut, tincidunt ut ante. Cras feugiat mollis risus, sed luctus orci condimentum at. Etiam condimentum, lacus ut posuere malesuada, lectus elit consectetur eros, eget tincidunt purus ipsum sit amet turpis. Mauris ac eros vitae velit dictum ultrices eu ac velit. Aenean interdum, ex nec vulputate tincidunt, est dolor tristique dui, sed sagittis urna nulla ac risus. Etiam ipsum metus, finibus sit amet pulvinar at, ultrices ac libero. Aenean tristique felis sit amet semper auctor. Integer porta neque sed velit tincidunt scelerisque. Fusce nec justo quis arcu ultrices ultricies. Proin fermentum pellentesque arcu vitae imperdiet. Integer tristique commodo arcu, eu cursus ipsum lobortis eu. Aenean hendrerit posuere ex, nec elementum mi tristique eu. Suspendisse felis purus, ultricies id nisi feugiat, scelerisque malesuada risus. Curabitur sit amet velit finibus, venenatis mauris vitae, tincidunt purus. Morbi eget tortor massa. Donec ut ante luctus, fermentum est a, euismod turpis. Proin risus ex, dignissim ac dignissim eu, semper eget lectus. Cras posuere pulvinar turpis, eu auctor ante fermentum quis. Sed tincidunt eu nulla tempus tempor.`

// This splits up an array into multiple arrays of a maximum length
// stolen from https://stackoverflow.com/a/11764168/8289918
const chunk = (arr, len) => {
  const chunks = []
  let i = 0
  while (i < arr.length) chunks.push(arr.slice(i, i += len))
  return chunks
}

const choppedString = mainResponse
  // Splits it into paragraphs (what you already did)
  .split(/\n\s*\n/)
  .flatMap(paragraph => paragraph.length > 1024
    // If the paragraph is over 1024 characters, split it into arrays with a
    // maximum of 5 sentences...
    ? chunk(paragraph.split(/(?<=[.?!\n])\s*/), 5)
      // ...and then trim each of those sentences (to remove the trailing
      // new line if there is any) and join them
      .map(sentences => sentences.map(s => s.trim()).join(' '))
    // If the pargraph is <= 1024 characters, just keep it as it is
    : paragraph)
console.log(choppedString)

/(?<=[.?!\n])\s*/ explanation:

  • (?<=[.?!\n]): a positive lookbehind that matches the characters ., ?, !, or a new line. The lookbehind means that those punctuation won't be removed, but are required for it to match.
  • \s*: any whitespace, if present

Note that this assumes that the 5 sentences will always be less than 1024 characters.

Lauren Yim
  • 12,700
  • 2
  • 32
  • 59
  • This worked! Thx! Only problem is sometimes, the API returns sentences without punctuation, it just puts the next sentence on a new line, and I've been unable to find a Regex that'll match this. I'll sometimes get (e.x 911/mr lonely): `They say the loudest in the room is weak That's what they assume, but I disagree I say the loudest in the room Is prolly the loneliest one in the room (that's me) Attention seeker, public speaker Oh my God, that boy there is so f*ckin' lonely Writin' songs about these people Who do not exist, he's such a f*ckin' phony ` – MaddieX May 20 '20 at 04:18
  • @MaddieX Use `/(?<=[.?!\n])\s*/` (I edited my answer). – Lauren Yim May 20 '20 at 05:17