-3

I trying to split text by two rules:

  1. Split by whitespace
  2. Split words greater than 5 symbols into two separate words like (aaaaawww into aaaaa- and www)

I create regex that can detect this rules (https://regex101.com/r/fyskB3/2) but can't understand how to make both rules work in (text.split(/REGEX/)

Currently regex - (([\s]+)|(\w{5})(?=\w))

For example initial text is hello i am markopollo and result should look like ['hello', 'i', 'am', 'marko-', 'pollo']

nl pkr
  • 656
  • 1
  • 11
  • 21

4 Answers4

1

It would probably be easier to use .match: match up to 5 characters that aren't whitespace:

const str = 'wqerweirj ioqwejr qiwejrio jqoiwejr qwer qwer';
console.log(
  str.match(/[^ ]{1,5}/g)
)
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • You're missing an edge case. It's trivial to add dashes to substrings that are exactly 5 chars long, but you can't assume that it was *longer* than 5 and therefore needs a dash. – mpen Aug 13 '18 at 20:34
1

My approach would be to process the string before splitting (I'm a big fan of RegEx):

1- Search and replace all the 5 consecutive non-last characters with \1-.

The pattern (\w{5}\B) will do the trick, \w{5} will match 5 exact characters and \B will match only if the last character is not the ending character of the word.

2- Split the string by spaces.

var text = "hello123467891234 i am markopollo";
var regex = /(\w{5}\B)/g;

var processedText = text.replace(regex, "$1- ");

var result = processedText.split(" ");

console.log(result)

Hope it helps!

Rodrigo Ferreira
  • 1,091
  • 8
  • 11
0

Something like this should work:

const str = "hello i am markopollo";
const words = str.split(/\s+/);
const CHUNK_SIZE=5;

const out = [];
for(const word of words) {
  if(word.length > CHUNK_SIZE) {
      let chunks = chunkSubstr(word,CHUNK_SIZE);
      let last = chunks.pop();
      out.push(...chunks.map(c => c + '-'),last);
  } else {
      out.push(word);
  }
}
console.log(out);

// credit: https://stackoverflow.com/a/29202760/65387
function chunkSubstr(str, size) {
  const numChunks = Math.ceil(str.length / size)
  const chunks = new Array(numChunks)

  for (let i = 0, o = 0; i < numChunks; ++i, o += size) {
    chunks[i] = str.substr(o, size)
  }

  return chunks
}

i.e., first split the string into words on spaces, and then find words longer than 5 chars and 'chunk' them. I popped off the last chunk to avoid adding a - to it, but there might be a more efficient way if you patch chunkSubstr instead.

regex.split doesn't work so well because it will basically remove those items from the output. In your case, it appears you want to strip the whitespace but keep the words, so splitting on both won't work.

mpen
  • 272,448
  • 266
  • 850
  • 1,236
0

Uses the regex expression of @CertainPerformance = [^\s]{1,5}, then apply regex.exec, finally loop all matches to reach the goal.

Like below demo:

const str = 'wqerweirj ioqwejr qiwejrio jqoiwejr qwer qwer'
let regex1 = RegExp('[^ ]{1,5}', 'g')

function customSplit(targetString, regexExpress) {
  let result = []
  let matchItem = null
  while ((matchItem = regexExpress.exec(targetString)) !== null) {
    result.push(
      matchItem[0] + ( 
        matchItem[0].length === 5 && targetString[regexExpress.lastIndex] && targetString[regexExpress.lastIndex] !== ' '
         ? '-' : '')
    )
  }
  return result
}
console.log(customSplit(str, regex1))
console.log(customSplit('hello i am markopollo', regex1))
Sphinx
  • 10,519
  • 2
  • 27
  • 45