0

I have a dynamic text file, and it's content could have fixed lines (repeated once) and x repeated blocks. Each block starts with the same code line "S21.G00.30.001" , but they could haven't the same contents, this is an extract from the content:

S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' 
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca' 
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...
S21.G00.30.001,'employee 2' //block 2, S21.G00.30.001 is the divider
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...
S21.G00.30.001,'employee 3' //block 3, S21.G00.30.001 is the divider
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
S21.G00 whatever code here ,'some text' // 30 or 40 or 55 ...

So, to get the fixed lines values witch are repeated only once, I use this method :

$file = fopen($this->getParameter('dsn_txt_folder') . 'dsn.txt', 'r');

    if ($file) {
        while (($line = fgets($file)) !== false) {
            if (str_starts_with($line, 'S10.G00.00.001')) {
                $website = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.002')) {
                $companyName = $this->getStringBetween($line, "'", "'");
            }
            if (str_starts_with($line, 'S10.G00.00.003')) {
                $version = $this->getStringBetween($line, "'", "'");
            }
            .......
       }
            
       fclose($file);

    }

But for x repeated blocks , how can I extract each blocks which starts with divider line "S21.G00.30.001" but the end of each block is unknown, and then put each block inside an array, like so I can easly read the values of each line.

The divider or the separator between each block is the line with "S21.G00.30.001".

Finaly , for those 3 blocks, I'd like to get an array like this.

array:1 [▼
  0 => array:3 [▼
    1 => array:7 [▼
      0 => "S21.G00.30.001,'employee one'"
      1 => "S21.G00.30.002,'AAAA'"
      2 => "S21.G00.30.004,'BBBB'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.006,'16021993'"
      5 => "S21.G00.30.007,'4'"
      6 => "S21.G00.30.008,'A Renasca'"
      7 => "S21.G00.40.008,'some text'"
      8 => "whatever code here,'some text'"
    ]
    2 => array:5 [▼
      0 => "S21.G00.30.001,'employee 2'"
      1 => "S21.G00.30.002,'CCCC'"
      2 => "S21.G00.30.004,'DDDD'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.006,'16021993'"
      5 => "whatever code here,'some text'"
    ]
    3 => array:6 [▼
      0 => "S21.G00.30.001,'employee 3'"
      1 => "S21.G00.30.002,'EEEE'"
      2 => "S21.G00.30.004,'FFFF'"
      3 => "S21.G00.30.005,'02'"
      4 => "S21.G00.30.007,'4'"
      5 => "S21.G00.30.008,'some text 3'"
      6 => "whatever code here,'some text'"
    ]
  ]
]
hous
  • 2,577
  • 2
  • 27
  • 66
  • We need a [mcve]. You have not given a concrete, mininal sample input string (it has an ellipsis in it). You have not yet provided your exact desired output from the input. If I answer, my answer will start with https://stackoverflow.com/q/1269562/2943403 – mickmackusa Jun 21 '22 at 20:51
  • @mickmackusa I have updated the end of my question and make an example how the array should be – hous Jun 22 '22 at 09:14
  • Why does employee 2 have `S21.G00.30.006`? Does your file actually have `.............` as a divider? – mickmackusa Jun 23 '22 at 09:59
  • @mickmackusa no it's a typo. each block starts with "S21.G00.30.001" and it can has others lines with codes like "S20.G00.40.001" , not all line contains "30" , the divider is "S21.G00.30.001" , if there is this line so it means a new block – hous Jun 23 '22 at 10:26

2 Answers2

1

Personally, I would first get all the data with file_get_contents(). Then use preg_match_all() to extract what I need. You can adapt this solution to use fopen(), fgets(), and preg_match() on your own.

A good regex will capture exactly what you need, then it's up to you to organize the data according to your logic. Here is an example that can handle multiple "id" strings:

<?php

//$data = file_get_contents($this->getParameter('dsn_txt_folder') . 'dsn.txt');
$data = "
S10.G00.00.001,'www.mywebsite.com' //fixed line
S10.G00.00.002,'Company name' // fixed line
S10.G00.00.003,'v2.01' //fixed line
S10.G00.00.005,'02' //fixed line
.............

S21.G00.30.001,'employee one' //block 1
S21.G00.30.002,'AAAA'
S21.G00.30.004,'BBBB' sx
S21.G00.30.005,'02'
S21.G00.30.006,'16021993'
S21.G00.30.007,'4'
S21.G00.30.008,'A Renasca'
S21.G00.30.001,'employee 2' //block 2
S21.G00.30.002,'CCCC'
S21.G00.30.004,'DDDD' 
S21.G00.30.005,'02'
S21.G00.30.001,'employee 3' //block 3
S21.G00.30.002,'EEEE'
S21.G00.30.004,'FFFF' 
S21.G00.30.005,'02'
S21.G00.30.007,'4'
S21.G00.30.008,'some text 3'
";
$extracted = [];
$ids = [
    'S21.G00.30.',
    //'S10.G00.00.',
];
foreach($ids as $id){
  $regex = "/^".implode('\\.', explode('.', $id))."(\d{3}),'(.*)'/m"; // "/^S21\.G00\.30\.(\d{3}),'(.*)'/m"
  $matches = [];
  $block = 0;
  preg_match_all($regex, $data, $matches);
  foreach($matches[0] as $i => $full){
    if('001' === $matches[1][$i]) 
      ++$block;
    $extracted[$id][$block][$matches[1][$i]] = $matches[2][$i];
  }
}

var_export($extracted);

This will yield the following:

array (
  'S21.G00.30.' => 
  array (
    1 => 
    array (
      '001' => 'employee one',
      '002' => 'AAAA',
      '004' => 'BBBB',
      '005' => '02',
      '006' => '16021993',
      '007' => '4',
      '008' => 'A Renasca',
    ),
    2 => 
    array (
      '001' => 'employee 2',
      '002' => 'CCCC',
      '004' => 'DDDD',
      '005' => '02',
    ),
    3 => 
    array (
      '001' => 'employee 3',
      '002' => 'EEEE',
      '004' => 'FFFF',
      '005' => '02',
      '007' => '4',
      '008' => 'some text 3',
    ),
  ),
)

See it in action here: https://onlinephp.io/c/fc256

Arleigh Hix
  • 9,990
  • 1
  • 14
  • 31
  • thanks for you, I have upadate the end of my question and I'd like to say that each block starts with "S21.G00.30.001" and it can has others lines with codes like "S20.G00.40.001" , not all line contains "30" , the divider is "S21.G00.30.001" , if there is this line so it means a new block. so what I need is to get those line between "S21.G00.30.001" and the last line before the next block witch whatever the code line is. the divider between each block is " "S21.G00.30.001" – hous Jun 23 '22 at 10:31
0

You can parse the file line by line. You will have the current block as an array variable, populate it as rows are parsed and, when a new block start just add the previous block to the final result array.

The following code uses basic functions (and not $this-> calls, as you have in the question). You can update the code as you wish.

<?php
// the file was placed on my server for testing
$file = fopen('test.txt','r');
// this will contain the final result
$result = [];
// currentBlock is null at first
$currentBlock = null;
while (($line = fgets($file)) !== false) {
    // extracting the line code
    $lineCode = substr($line, 0, 14);
    // checking if the row contains a value, between two '
    $rowComponents = explode("'", $line);
    if (count($rowComponents) < 2) {
        // the row is not formatted ok
        continue;
    }
    $value = $rowComponents[1];
    switch ($lineCode) {
        case 'S10.G00.00.001':
            $website = $value;
            break;
        case 'S10.G00.00.002':
            $companyName = $value;
            break;
        case 'S10.G00.00.003':
            $version = $value;
            break;
        case 'S21.G00.30.001':
            // starting a new entry
            if ($currentBlock !== null) {
                // we already have a block being parsed
                // so we added it to the final result
                $result[] = $currentBlock;
            }
            // starting the current block as an empty array
            $currentBlock = [];
            $currentBlock['property1'] = $value;
            break;
        case 'S21.G00.30.002':
            $currentBlock ['property2'] = $value;
            break;
        case 'S21.G00.30.004':
            $currentBlock ['property4'] = $value;
            break;
    }
}
// adding the last entry into the final result
// only if the block exists
if ($currentBlock !== null) {
    $result[] = $currentBlock;
}
fclose($file);
// output the result for debugging
// you also have the $website, $companyName, $version parameters populated
var_dump($result);

?>

After the scrips runs, I have the following output, from the var_dump call:

array(3) {
  [0]=>
  array(3) {
    ["property1"]=>
    string(12) "employee one"
    ["property2"]=>
    string(4) "AAAA"
    ["property4"]=>
    string(4) "BBBB"
  }
  [1]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 2"
    ["property2"]=>
    string(4) "CCCC"
    ["property4"]=>
    string(4) "DDDD"
  }
  [2]=>
  array(3) {
    ["property1"]=>
    string(10) "employee 3"
    ["property2"]=>
    string(4) "EEEE"
    ["property4"]=>
    string(4) "FFFF"
  }
}
Cosmin Staicu
  • 1,809
  • 2
  • 20
  • 27