0

I was using a script to exclude a list of words from another list of keywords. I would like to change the format of the output. (I found the script on this website and I have made some modification.)

Example:

Phrase from outcome: my word

I would like to add quotes: "my word"

I was thinking that I should put the outcome in new-file.txt and after to rewrite it, but I do not understand how to capture the result. Please, kindly give me some tips. It's my first script :)

Here is the code:

<?php
    $myfile = fopen("newfile1.txt", "w") or die("Unable to open file!");
    //    Open a file to write the changes - test
    $file = file_get_contents("test-action-write-a-doc-small.txt");
    //  In small.txt there are words that will be excluded from the big list  
    $searchstrings = file_get_contents("test-action-write-a-doc-full.txt");
    //  From this list the script is excluding the words that are in small.txt      
    $breakstrings = explode(',',$searchstrings);
    foreach ($breakstrings as $values){
      if(!strpos($file, $values)) {
        echo $values." = Not found;\n";
      } 
      else {
        echo $values." = Found; \n";
      }
    }
    echo "<h1>Outcome:</h1>";  
    foreach ($breakstrings as $values){
      if(!strpos($file, $values)) {
        echo $values."\n";
      } 
    }
    fwrite($myfile, $values); //    write the result in newfile1.txt - test

    //    a loop is missing?

    fclose($myfile); //    close newfile1.txt - test
?>   

There is also a little mistake in the script. It works fine however before entering the list of words in test-action-write-a-doc-full.txt and in test-action-write-a-doc-small.txt I have to put a break for the first line otherwise it does not find the first word.

Example:

In test-action-write-a-doc-small.txt words:

pick, lol, file, cool,

In test-action-write-a-doc-full.txt wwords:

pick, bad, computer, lol, break, file.

Outcome:

Pick = Not found -- here is the mistake.

It happens if I do not put a break for the first line in .txt

lol = Found

file = Found

Thanks in advance for any help! :)

trincot
  • 317,000
  • 35
  • 244
  • 286
Krista
  • 3
  • 2

1 Answers1

0

You can collect the accepted words in an array, and then glue all those array elements into one text, which you then write to the file. Like this:

echo "<h1>Outcome:</h1>";  
// Build an array with accepted words
$keepWords = array();
foreach ($breakstrings as $values){
  // remove white space surrounding word
  $values = trim($values);
  // compare with false, and skip empty strings
  if ($values !== "" and false === strpos($file, $values)) {
    // Add word to end of array, you can add quotes if you want
    $keepWords[] = '"' . $values . '"';
  } 
}
// Glue all words together with commas
$keepText = implode(",", $keepWords);
// Write that to file
fwrite($myfile, $keepText);

Note that you should not write !strpos(..) but false === strpos(..) as explained in the docs.

Note also that this method of searching in $file will maybe give unexpected results. For instance, if you have "misery" in your $file string then the word "is" (if separated by commas in the original file) will be refused, as it is found in $file. You might want to review this.

Concerning the second problem

The fact that it does not work without first adding a line-break in your file leads me to think it is related to the Byte-Order Mark (BOM) that appears in the beginning of many UTF-8 encoded files. The problem and possible solutions are discussed here and elsewhere.

If indeed it is this problem, there are two solutions I would propose:

Use your text editor to save the file as UTF-8, but without BOM. For instance, notepad++ has this possibility in the encoding menu.

Or, add this to your code:

function removeBOM($str = "") {
    if (substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) {
        $str = substr($str, 3);
    }
    return $str;
}

and then wrap all your file_get_contents calls with that function, like this:

$file = removeBOM(file_get_contents("test-action-write-a-doc-small.txt"));
//  In small.txt there are words that will be excluded from the big list
$searchstrings = removeBOM(file_get_contents("test-action-write-a-doc-full.txt"));
//  From this list the script is excluding the words that are in small.txt

This will strip these funny bytes from the start of the string taken from the file.

Community
  • 1
  • 1
trincot
  • 317,000
  • 35
  • 244
  • 286
  • Many thanks for your reply! It's very kind from you! I have tried the first part regarding the quotes. 1. $keepWords = array(); - it saves ALL the values,not sorted. So, in newfile1.txt it saves aggregated values that I had in test-action-write-a-doc-small.txt and test-action-write-a-doc-full.txt. 2. $keepWords[] = '"' . $values . '"'; - it adds quotes but not for the word itself. It ads quotes after the word. Ex.: My word " ". The second part will try later)) Thanks again! – Krista Nov 29 '15 at 11:24
  • I updated the code and added two notes below the code. This should fix some of the issues. I also removed the `echo` from the loop, as I think it will you bring to wrong conclusions about where quotes are applied. – trincot Nov 29 '15 at 12:26
  • thanks again for the corrections! `code` I did the following: if(!strpos($file, $values)) { echo $values."\n"; $values = rtrim($values); // simple rtrim also works // Add word to end of array $keepWords[] = $values."""; //you cannot put quotes in front, it will still add them at the end.`code` – Krista Nov 30 '15 at 13:50
  • To add the quotes, I have this solution: `code``code` – Krista Nov 30 '15 at 13:51
  • And regarding the blank line in front, basically, I found out that I have to put one spare line only for test-action-write-a-doc-small.txt. Which makes me thinking that there is problem with `code` $breakstrings = explode(',',$searchstrings); `code` because it might not check zero argument in `code` $file = file_get_contents("test-action-write-a-doc-small.txt"); `code` . But it is just my guess. – Krista Nov 30 '15 at 13:56
  • Please reread what I said about not using `!strpos`. If you say it works with `"`, it means there is a larger context I do not know of; some HTML container in which this is displayed, maybe a `textarea` or something. But I am glad to hear you get that part working! For the blank line in front: did you not try the two solutions I suggested? If you did, what was the result? – trincot Nov 30 '15 at 15:47
  • Hi, @trincot! How are you? :) When I added the "!strpos" as you proposed, it stopped showing the results for "Outcome" however I could see them saved in newfile1.txt. It also did not solve the issue for the brackets. They were still added at the end. So, I have found the other solution. " or ' " ' it does not make any difference, I can use both. As for the blank area, I tried your solution, but may be I installed it not correctly, so, it gave me an error. I also tried to remove utf-8 from code. I have a feeling that it's not this the issue. – Krista Dec 01 '15 at 07:37
  • I did not suggest to add `!strpos`. I suggested to change it to something advised in the PHP docs! I don't know what you mean with the brackets. Anyway, your code has changed now on several aspects, and these comments are not the ideal means to help you. If you still need an answer, then please post a new question all together. That will give you a faster solution in the end, since more people will start looking at it again. – trincot Dec 01 '15 at 08:58
  • Ok, no problem. Many thanks for your help! I've learned really a lot with you! So, now I could build something that solves my needs! :) – Krista Dec 01 '15 at 11:49