0

Is it possible to take a very long string and split it by sentence into 5000 char (or smaller) array items?

Here's what I have so far:

<?php
$text = 'VERY LONG STRING';

foreach(explode('. ', $text) as $chunk) {
    $accepted[] = $chunk;
}
?>

This just splits the string into an array containing single sentence items. I need to group items into sub arrays, each containing a list of items which, when added together, contain no more than 5000 characters.

I tried this:

<?php
$text = 'VERY LONG STRING';

foreach(explode('. ', $text) as $chunk) {
    $key = strlen(implode('. ', $accepted).'. '.$chunk) / 5000;
    $accepted[$key][] = $chunk;
}
?>

You can probably see what I tried to do here, but it didn't work.

UPDATE: This did the trick:

<?php
foreach(explode('. ', $text) as $chunk) {
  $chunkLen = strlen(implode('. ', $result).'. '.$chunk.'.');
  if ($len + $chunkLen > 5000) {
    $result[] = $partial;
    $partial = [];
    $len = 0;
  }
  $len += $chunkLen;
  $partial[] = $chunk;
}

if($partial) $result[] = $partial;
?>

Thank you to everyone who responded, your support means a lot.

John Doe
  • 301
  • 2
  • 12
  • Not sure if https://stackoverflow.com/questions/29818232/php-split-string-in-array-after-x-characters-without-cut-word-on-limit is the same. – Nigel Ren May 03 '20 at 16:05
  • You could simply use https://www.php.net/manual/de/function.chunk-split.php – Carsten Massmann May 03 '20 at 16:10
  • Is the `.` in your explode meant to be regex special char or just a variable? Something like `preg_split('/(.{1,5000})/', $text` could be close. Might also be inefficient though depending how large string is. – user3783243 May 03 '20 at 16:12
  • Hi @Nigel Ren, I was distracted while posting :D - but I assume everyone here should be able to switch languages! ;-) – Carsten Massmann May 03 '20 at 16:16

2 Answers2

1

If I don't misunderstand your question then you need something like this,

<?php
$text = 'VERY LONG STRING';
$s = chunk_split($text, 3, '|'); // put 5000 instead of 3
$s = substr($s, 0, -1);
$accepted = explode('|', $s);
print_r($accepted);
?>

OR

<?php
$text = 'VERY LONG STRING';
$accepted = str_split($text, 3);
print_r($accepted);
?>

DEMO: https://3v4l.org/H9DAl

DEMO: https://3v4l.org/PN7Aj

A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103
1

You could do something like this:

$text = 'VERY LONG STRING';
$result = [];
$partial = [];
$len = 0;

foreach(explode(' ', $text) as $chunk) {
  $chunkLen = strlen($chunk);
  if ($len + $chunkLen > 5000) {
    $result[] = $partial;
    $partial = [];
    $len = 0;
  }
  $len += $chunkLen;
  $partial[] = $chunk;
}

if ($partial) {
    $result[] = $partial;
}

You can test it more easily if you do it with a lower max length

Alejandro De Cicco
  • 1,216
  • 3
  • 17
  • This was a huge help. After some changes, I got it to do exactly what was needed. I've updated this question with the correct code. Thank you. – John Doe May 03 '20 at 16:34