-4

I need to process strings that have a mostly regular format/structure. Basically, the string contains 3 keywords that always appear in the same order: ATLPáscoa, ATLNatal, and ATLVerão

Between these keywords are an unknown number of whitespace characters. Also, there is the possibility that each of the keywords will be followed by a date value that may consist of non-whitespace and whitespace characters.

Associated by their keyword, I want to declare 3 variables called $datePáscoa, $dateNatal, and $dateVerão and assign the date substring to these variables.

Here's an example:

$string = 'ATLPáscoa            ATLNatal          ATLVerão     Turno11-03a07desetembro';

My desired output is:

$datePáscoa = '';
$dateNatal = '';
$dateVerão = 'Turno11-03a07desetembro';

Here is another example:

$string = 'ATLPáscoa  bananas   ATLNatal xyza sd af          ATLVerão      Turno11-03a07desetembro';

My expected output is:

$datePáscoa = 'bananas';
$dateNatal = 'xyza sd af';
$dateVerão = 'Turno11-03a07desetembro';

I tried to use the str_replace(), but it is clearly not the way:

$string = str_replace("Atelier","",$string );
$string = str_replace("Páscoa","",$string );
$string = str_replace("Natal","",$string );
$string = str_replace("Verão","",$string );

How can I extract the date values and assign the values to the appropriate variable?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Pbras
  • 189
  • 11
  • 4
    This is a specification and not a question! **We are very willing to help you fix your code, but we dont write code for you** – RiggsFolly Jul 17 '18 at 14:20
  • To ask On Topic question, please read [Question Check list](https://meta.stackoverflow.com/questions/260648/stack-overflow-question-checklist) and [the perfect question](http://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) and how to create a [Minimal, Complete and Verifiable Example](http://stackoverflow.com/help/mcve) and [take the tour](http://stackoverflow.com/tour) – RiggsFolly Jul 17 '18 at 14:20
  • @RiggsFolly you are right, I spended too much time trying to explain the problem that I forgot to ask the question, is it better now? – Pbras Jul 17 '18 at 14:34

2 Answers2

1

Code: (Demo <- with an alternative input string)

$string = 'ATLPáscoa  banana   ATLNatal xyza sd af          ATLVerão      Turno11-03a07desetembro';

$datePáscoa = preg_match('~ATLPáscoa\s*\K(?!ATL)\S+(?:\s+(?!ATL)\S+)*~u', $string, $out) ? $out[0] : '';
$dateNatal = preg_match('~ATLNatal\s*\K(?!ATL)\S+(?:\s+(?!ATL)\S+)*~u', $string, $out) ? $out[0] : '';
$dateVerão = preg_match('~ATLVerão\s*\K\S+(?:\s+\S+)*~u', $string, $out) ? $out[0] : '';

echo '$datePáscoa = '; var_export($datePáscoa); echo "\n";
echo '$dateNatal = '; var_export($dateNatal); echo "\n";
echo '$dateVerão = '; var_export($dateVerão);

Output:

$datePáscoa = 'banana'
$dateNatal = 'xyza sd af'
$dateVerão = 'Turno11-03a07desetembro'

If this were my project, I'd probably build a single regex function call that returns all of the matches in an array, then I'd extract what I wanted, when I wanted it. You have asked for individually named variables, so I think 3 function calls will be simplest to demonstrate.

The input that you have offered doesn't require the inclusion of the u pattern modifier, but I am adding it in case your actual data requires it.

\K tells the regex engine to "release previously matched characters" from the fullstring match -- this is used to avoid the use of a capture group and ensure your returned value is only the "white meat". The same reason is why you see \S+(?:\s+\S+)* -- which matches a "word" then optionally matches one or more whitespaces followed by another "word".

I am using var_export() in my demo to show that there are no leading or trailing whitespace characters in the results.

(?!ATL) in the first two patterns is used to avoid "over matching" or basically "matching too far". The third pattern doesn't require this consideration.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • @mickmacksusa This should probably work but for some reason its not detecting "ATLPáscoa" and "ATLVerão" since I'm getting the string from a database it possible has some other type of special char that looks like "á" in "Páscoa" and "ã" in"Verão", I was also having problems with this when using the string_replace. – Pbras Jul 18 '18 at 13:50
  • I think your suspicions are correct. Make sure that you copy those characters from your database directly into your code to ensure that you are writing the exact unicode character. p.s. I never use accented letters in my variable names -- it only makes things hard to type out, but that's my personal preference. And of course, this link may be relevant: https://stackoverflow.com/q/279170/2943403 – mickmackusa Jul 18 '18 at 13:51
  • Im using this "$string = $query->dates;" isn't that direct copy? – Pbras Jul 18 '18 at 13:55
  • That's not what I mean. I mean copy the `Páscoa` and `ATLVerão` from your database, and paste that into my suggested pattern to ensure that the exact expected characters are being used. – mickmackusa Jul 18 '18 at 13:56
  • How should I do that, copying from phpMyAdmin? – Pbras Jul 18 '18 at 14:09
  • Yes, that's what I recommend. – mickmackusa Jul 18 '18 at 14:15
  • I used $string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string); now it works but the output is diferent than yours.: "$datePáscoa = banana ATLNatal xyza sd af ATLVerao Turno11-03a07desetembro", "$dateNatal=xyza sd af ATLVerao Turno11-03a07desetembro" "$dateVerao = Turno11-03a07desetembro" – Pbras Jul 18 '18 at 14:45
  • Are you able to determine if it is the variable names that are breaking the script or the regex pattern that is failing? Are there any clues in the error log? – mickmackusa Jul 18 '18 at 14:45
  • I am going to bed in a minute. Please quickly dump your latest code (just the portion that we are working on) into https://3v4l.org/ , click eval, then send me that new url so that I can see. What is your php version? – mickmackusa Jul 18 '18 at 14:46
  • I'm going away (it is 1am) I'll check back in the morning. – mickmackusa Jul 18 '18 at 14:53
  • I managed to get it to work with the base of your idea: https://3v4l.org/6ZB6G Thanks for the help :) – Pbras Jul 18 '18 at 15:15
  • I'm using "$excel->getActiveSheet()" cause I'm trying to conver the database into a excel sheet, if I use "var_export($out);" will just print it. So basicaly think of my "$excel->getActiveSheet()" as a print. – Pbras Jul 19 '18 at 09:01
  • I've got no experience with that. Just trying to tighten up your process. – mickmackusa Jul 19 '18 at 09:03
  • Yeh I know, and I apreciate any good ideia :) I try to adpte the ideais given to simplify my code. "preg_match('~ATLP\?scoa\s*(\S+(?:\s+\S+)*)?\s*ATLNatal\s*(\S+(?:\s+\S+)*)?\s*ATLVer\?o\s*(\S+(?:\s+\S+)*)?\s*~u', $string, $out);" If I do this what will be stored into $out? – Pbras Jul 19 '18 at 09:06
  • also by using: "$string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', 'ATLPáscoa banana ATLNatal xyza sd af ATLVerão Turno11-03a07desetembro'); $string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', 'ATLPáscoa banana ATLNatal ATLVerão Turno11-03a07desetembro');" aren't you overwrighting the first $string? – Pbras Jul 19 '18 at 09:07
  • Yes. That is deliberate. Those are two test cases so that you can see that it work when all dates exist AND when one or more dates are missing. You can comment-out either one to see the outcome. – mickmackusa Jul 19 '18 at 09:17
0

Well, since ATL is in every part, I would start with an explode:

$array = explode("ATL", $string);

Note that $array[0] will be an empty string (in this case, but as I see you get the same type of input every time), and then trim the leading and trailing spaces like this:

for ( $i = 0; $i < count($array); $i++ ) {
 trim($array[$i]);
}

And then copying into their respective variables:

$datePáscoa = $array[1];
$dateNatal = $array[2];
$dateVerão = $array[3];

At this point, they still contain their names, so we cut it with a combo of strpos() (which gives back the position of a specific string) and strstr() (returns part of the string from a given pointer), like:

$datePáscoa = strstr($datePáscoa,strpos($datePáscoa," ")+1);

That's a whitespace there. After that maybe trim them again, as they may still contain some whitespaces after the strstr.

kry
  • 362
  • 3
  • 13
  • I had thoght of this option but since there is the possibility of having a date or not, and having white spaces beetween them, the explode wouldn't have the same values all the time. exemple: 'ATLPáscoa ATLNatal xyza sd af ATLVerão' $array[1]= Páscoa; $array[2]= Natal; $array[3]=xyza; Meaning that the dates wouldn't be in the same position all the time. – Pbras Jul 18 '18 at 13:01
  • I usualy only use explode with " ", so in this case would be $array[1]= Páscoa; $array[2]= Natal xyza sd; $array[3]= Verão;" is this right? – Pbras Jul 18 '18 at 13:24
  • But why would you even do that? I clearly stated use explode with "ATL", using it with whitespace has no use when the whole thing is filled with whitespace... -.-' – kry Jul 18 '18 at 13:38
  • I was using it with "ATL", I was saying I wasn't understaing with "ATL" cause I noramaly use explode with " ", that's why I wasn't understanding how it worked on this case. – Pbras Jul 18 '18 at 13:47