1

I wrote the following SSCCE to demonstrate that I have a string (of some HTML), and I am using simple_html_dom parser to find out the div with a particular value of the class attribute. This works fine. But I need to remove this div from the parent string. So I am using str_replace, but it doesn't seem to work. Please tell me why and what is the solution.

I checked the solutions to questions addressing similar problems but they did not apply or work on my problem. I also tried to use str_replace_first from this question's answer by Bas. But it does not work either.

You can see in the screenshot that it just prints the entire $haystack after printing --------.

$haystack = '<div class="region-content"  style="margin-right:100px; margin-left:100px;">
                                                                                                <div role="main"><span id="maincontent"></span><div class="que description informationitem notyetanswered" id="q6"><h4 class="accesshide">Question text</h4><input type="hidden" name="q3:6_:sequencecheck" value="1" /><div class="qtext"><p style="font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; margin: 0px 0px 20px; padding: 0px; color: #464646; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff;">Schools expect textbooks to be a valuable source of information for students. My research suggests, however, that textbooks that address the place of Native Americans within the history of the United States distort history to suit a particular cultural value system. In some textbooks, for example, settlers are pictured as more humane, complex, skillful, and wise than Native Americans. In essence, textbooks stereotype and depreciate the numerous Native American cultures while reinforcing the attitude that the European conquest of the New World denotes the superiority of European cultures. Although textbooks evaluate Native American architecture, political systems, and homemaking, I contend that they do it from an ethnocentric, European perspective without recognizing that other perspectives are possible. </p>
<p style="font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; margin: 0px 0px 20px; padding: 0px; color: #464646; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: #ffffff;">One argument against my contention asserts that, by nature, textbooks are culturally biased and that I am simply underestimating children\'s ability to see through these biases. Some researchers even claim that by the time students are in high school, they know they cannot take textbooks literally. Yet substantial evidence exists to the contrary. Two researchers, for example, have conducted studies that suggest that children\'s attitudes about particular cultures are strongly influenced by the textbooks used in schools. Given this, an ongoing, careful review of how school textbooks depict Native Americans is certainly warranted.</p></div><div class="im-controls"><input type="hidden" name="q3:6_-seen" value="1" /></div></div>

<div class="que multichoice deferredfeedback notyetanswered" id="q7"><div class="qtext"><p><span style="color: #464646; font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: #ffffff;">Which of the following would most logically be the topic of the paragraph immediately following the passage?</span></p></div><div class="ablock"><div class="prompt">Select one:</div><div class="answer"><div class="r0"><input type="radio" name="q3:7_answer" value="0" id="q3:7_answer0" /><label for="q3:7_answer0">a. <span style="color: #464646; font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: #ffffff;">the contributions of European immigrants to the development of the United States</span></label> </div>
<div class="r1"><input type="radio" name="q3:7_answer" value="1" id="q3:7_answer1" /><label for="q3:7_answer1">b. <span style="color: #464646; font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: #ffffff;"><span class="Apple-converted-space"> </span>the centrality of the teacher\'s role in United States history courses</span></label> </div>
<div class="r0"><input type="radio" name="q3:7_answer" value="2" id="q3:7_answer2" /><label for="q3:7_answer2">c. <span style="color: #464646; font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: #ffffff;">nontraditional methods of teaching United States history</span></label> </div>
<div class="r1"><input type="radio" name="q3:7_answer" value="3" id="q3:7_answer3" /><label for="q3:7_answer3">d. <span style="color: #464646; font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: #ffffff;">specific ways to evaluate the biases of United States history textbooks <br /></span></label> </div>
<div class="r0"><input type="radio" name="q3:7_answer" value="4" id="q3:7_answer4" /><label for="q3:7_answer4">e. <span style="color: #464646; font-family: HelveticaNeueW01-55Roma, Helvetica, Arial, san-serif; font-size: 14.4444446563721px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 19.5px; orphans: auto; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: #ffffff;">ways in which parents influence children\'s political attitudes <br /></span></label> </div>
</div></div></div>';

require('C:/xampp/htdocs/simple_html_dom.php');
$html = str_get_html($haystack);
foreach($html->find('div[class=que description informationitem notyetanswered]') as $h) {
    $reading_passage_outertext = $h->outertext;
}
$hay = str_replace($reading_passage_outertext, "", $haystack);
echo $reading_passage_outertext;
echo '---------------------------------------------------------------------------------------------------------------';
echo $hay;

enter image description here

Community
  • 1
  • 1
Solace
  • 8,612
  • 22
  • 95
  • 183
  • 2
    So are all those newline characters identical in `$reading_passage_outertext`? – Mark Baker Sep 15 '14 at 13:28
  • 3
    Since it's 100% unlikely that str_replace itself fails, it means your haystack or search string aren't ok for some reason. Since we can't see the **real** values, it means you need to dig further yourself. – N.B. Sep 15 '14 at 13:28
  • @MarkBaker Which newline chanracters? – Solace Sep 15 '14 at 13:32
  • @N.B. You are seeing the real values. I copied them from the page-source of my page. And when does a string become "not OK" to be affected by str_replace? – Solace Sep 15 '14 at 13:34
  • 1
    Well there's at least 9 newline characters in your test `$haystack`.... I don't know if the div you're searching spans multiple lines, but it's certainly possible that simple_html_dom is changing line breaks because they are simply a whitespace to a web browser.... check the exact value of `$reading_passage_outertext` – Mark Baker Sep 15 '14 at 13:35
  • @MarkBaker Firstly, Thank you! I manually removed the newlines in the `$haystack` in the example. It started working. But can you tell if there is a way to remove all newline characters from a string coming from a database or something (that is I am not creating the string)? – Solace Sep 15 '14 at 13:42
  • 2
    I guess `outertext` is returning an interpretation of parsed html. So it probably removes excess spaces, and may jumble the order of attributes. In other words you're getting a different string out than what `str_get_html` took in. You may want to just remove the div using the parser instead of `str_replace` – stakolee Sep 15 '14 at 13:44
  • @stakolee - good call on attributes as well, and +1 for recommendation to use the parser rather than str_replace, you should provide it as an answer – Mark Baker Sep 15 '14 at 13:46
  • 1
    Now after you made it work, you can see what I meant when I mentioned *real* values. The string you copy pasted here is not the same string that you have to work with. Mark Baker correctly assumed it's about newlines - we can't see them. You can. Newline characters are carriage return and newline (`"\r"` and `"\n"`). There are many, many methods available to you in order to do anything you like with them, including replace them. There are probably more of those that we can't see, and the usual suspect is of course `"\t"` as well. – N.B. Sep 15 '14 at 13:46
  • If somebody can write an answer, I will accept it, so it is helpful for people in future. – Solace Sep 15 '14 at 13:53

2 Answers2

3

I guess outertext is returning an interpretation of parsed html. So it probably removes excess spaces, and may jumble the order of attributes. In other words you're getting a different string out than what str_get_html took in. You may want to just remove the div using the parser instead of str_replace

stakolee
  • 893
  • 1
  • 7
  • 20
0

What I can see on your example is, that

$hay = str_replace($reading_passage_outertext, "", $haystack);
echo $reading_passage_outertext;

is outside of the foreach. So $reading_passage_outertext only contains the last entry and str_replace does not really work.

I'm also not realy shure why you run through the str_get_html. It doesn't make sense and costs quite some performance.

str_replace also accepts arrays as pattern and replacement values. Try using str_replace only.

Chris West
  • 741
  • 13
  • 36