-3

I'm trying to take a list of lines, and have PHP only output the lines that contain the same word (variable) twice. It should match both singular and plural versions of the word.

Example list of lines:

This is a best website of all the websites out there

This is a great website

Here is a website I found while looking for websites

Website is a cool new word

I would put these lines into a textbox, and the script would output:

This is a best website of all the websites out there

Here is a website I found while looking for websites


No need for displaying any counts, only the raw lines that include the word twice.

I'm pretty decent at manipulating lines, but I searched everywhere for the answer to this, it seems to not exist.

Community
  • 1
  • 1
Joe
  • 143
  • 10
  • 1
    https://stackoverflow.com/questions/16333681/regex-repeated-words-on-the-same-line This seems like the direction you want to be headed in. – lit Aug 06 '17 at 03:07
  • I'm totally clueless about Javascript... Don't think this post is going to accomplish what I need to do. – Joe Aug 06 '17 at 03:09
  • Its about using regex. https://stackoverflow.com/questions/18029230/regex-match-for-each-line Another link, that may lead you in the right direction. – lit Aug 06 '17 at 03:34
  • I'm more thinking this guy is on to something. https://stackoverflow.com/a/5995141/8423219 - but I'm trying to figure out how to make it work. – Joe Aug 06 '17 at 03:41
  • How do you plan on defining a line? – lit Aug 06 '17 at 03:44
  • A line is defined by a line break. I will insert a list of "lines" aka sentences (the sentences are all broken down line-by-line) into a textbox, submit... Most of my scripts create text files with the lines/sentences using fopen, and then I manipulate the lines from that text file in the script. – Joe Aug 06 '17 at 03:46
  • I feel like this is the exact process I need to do: 1) Count mentions in each line. 2) Output lines that contain a duplicate of whatever word I specify. I feel like the answer I linked is something close.. but I can't get it to work. – Joe Aug 06 '17 at 03:47
  • Put the lines into an array, use a loop and a regex to find the matches? – lit Aug 06 '17 at 03:48
  • I'm pretty much a php newb, I know just enough to manipulate content / lines to perform the functions I need. Usually I can find answers very quickly on SO, but not this one. Can you answer this -- how would I execute the code I listed? I've tried echo'ing "duplicates" but I get errors. https://stackoverflow.com/a/5995141/8423219 – Joe Aug 06 '17 at 03:50

1 Answers1

1

For the test purpose I didn't use something like $text = $_POST['text'];, instead I used a variable to store the text, Also the class I'm using to pluralize words comes from here.

Note: I rolled back the answer to address exactly the question, the previous answer which was trying to address the comments has been moved here.

<?php    

$text = "This is a best website of all the websites out there
    This is a great website
    Here is a website I found while looking for websites
    Website is a cool new word';
// helps us pluralize all words, so we can check the duplicates 
include('class.php'); 

// loop into each line one by one
foreach(explode("\n", $text) as $line)
{
        // remove special characters
        $tline = preg_replace('/[^A-Za-z0-9\-\s]/', '', $line);

        // create a list of words from current line
        $words_list = preg_split('/\s+/', strtolower($tline));

        // convert all singular words to plural
        foreach($words_list as $word)
        {
                $w[] = Inflect::pluralize($word);
        }

         // if the count of words in this line was bigger that of unique
         // words then we got some duplicates, echo this line out
        if( count($w) > count(array_unique($w)) )
                echo $line . '</br>';

        // empty the array for next line
        $w = [];
}

The output for your desired text would be:

This is a best website of all the websites out there
Here is a website I found while looking for websites

However the correctness of code really depends on how our pluralize method is working.


How it's working

First I'm looping into each line one by one using, at each iteration I'm making a list of words from that line with, then we should convert all singular words to plurals (or plural to singular it doesn't really matters), Now I've got a list of words which all of them are plural and I can easily check them to see if all of them are unique or not, if the number of words on that line is bigger than of the unique words then I can find out there are duplicates word there so I should print that line out.

Ravexina
  • 2,406
  • 2
  • 25
  • 41
  • It's close but not working with certain words... Example, try this: Playing blackjack on the blackjack table. Here's a blackjack game. My favorite blackjack and blackjack. Wow a great blackjack game. This example only outputs a single line "Playing blackjack on the blackjack table." when there are two lines that contain "blackjack" twice. – Joe Aug 06 '17 at 04:30
  • As I said it depends on our class which is in charge of pluralizing the words, there is nothing we can do about it unless creating a really better class which is able of doing it for whole English language. – Ravexina Aug 06 '17 at 04:31
  • Really appreciate your help. This is close to working... I'm trying every modification I know how to do, but I can't make it work. – Joe Aug 06 '17 at 04:40
  • It is because of dot... `blackjack.` is differ from `blackjack`. – Ravexina Aug 06 '17 at 04:40
  • @Joe I updated the answer, check it out again... you can accept it if you think it was helpful to you ;) – Ravexina Aug 06 '17 at 04:50
  • So close but it doesn't understand capital letters vs not capital letters (a duplicate is a duplicate regardless of capitalized or not). Also, it's still not only matching duplicates of the provided word. For example: "For one thing, blackjack games are much more convenient than offline ones as they can be played online." Notice how it doesn't remove this line, because it contains "one" and "ones". – Joe Aug 06 '17 at 05:41
  • 1
    @Joe learn a little bit PHP, it's not hard ... I can't even remember when was the last time I wrote a project in PHP, or heir someone willing to develop it for you. php.net is your friends too ;) – Ravexina Aug 06 '17 at 05:57
  • @Joe: we discourage offers of payment here. The whole point of Stack Overflow is to help people on questions they have made an effort on prior to asking, since these are the sorts of questions that might be useful to other readers. The last problems are something you could try yourself, would you give it a go? – halfer Aug 06 '17 at 08:57
  • @halfer I did try myself, I spent 5+ hours trying everything, reading dozens, if not hundreds of posts on SO, and found nothing. I'm just going to hire a coder to finish the script. – Joe Aug 06 '17 at 23:22
  • Fair enough @Joe. The trouble regular readers have is that we see many zero-effort posts every day. I'm sure that does not apply to you, but _showing_ what you have done (even though it does not work) really does help reassure readers that you're not just after a bit of free work. – halfer Aug 06 '17 at 23:37