0

I am trying to scrape some data by using curl and simple_html_dom library, i can successfully scrape data but the problem is i don't want some text with data.

This is the code i am using :

$price = $html->find("div[id='vi-mskumap-none'] span[itemprop='price']",0)->plaintext;

This is html source code :

<div id="vi-mskumap-none" style="" class="u-flL w29 vi-price ">
    <span class="notranslate" id="prcIsum" itemprop="price" style="" content="515.0">US $515.00</span>

It is scraping

US $515.00

But I want to remove US $ and only want

515.00

can someone please help

Muhammad Faisal
  • 131
  • 1
  • 2
  • 8

4 Answers4

3

Since you say that the format of the string will always be the same, there's no need for any regex. Just use str_replace()

$price = 'US $515.00';
$price = str_replace('US $', '', $price);

Here's a demo: https://3v4l.org/ZDl5t

That will give you a string: 515.00. If you want it to be a real float, then just cast it:

$price = (float)str_replace('US $', '', $price);
M. Eriksson
  • 13,450
  • 4
  • 29
  • 40
1

If you know it's just going to be a number, my method is:

$price = preg_replace("/[^0-9.\-]+/", '', $price);

Here's what the regex means:

  • [^ means we're starting a negative set. It will match anything that is NOT in this set
  • 0-9 means the numbers 0 through 9
  • . is a period, in case the number (like your example) has a decimal point (normally you have to escape periods in regex since period means "any character," but when it's in a set like this (in square brackets), you don't have to escape it
  • \- is an escaped dash "-" and I added it in case you can find negative numbers.
  • ] Closes off the set
  • + means that it can match one or more character (this way it would replace "US $" in one pass instead of three, though I don't know if it makes a difference)

Then I'm replacing anything that matches (everything except a number or period or dash) with an empty string '' which effectively deletes it.

Stevish
  • 734
  • 5
  • 17
  • This may remove a space or comma as a thousands separator – Andreas Jun 28 '19 at 12:40
  • @Andreas, if you want only a number in the result, then you actually want to remove commas and spaces. Unless all you're doing is storing the number to be displayed on the screen later, the commas will only get in the way. – Stevish Jun 28 '19 at 12:45
  • I'm not OP. I'm just saying it could be a problem – Andreas Jun 28 '19 at 12:46
  • Also, for the OP, @Muhammad_Faisal's answer is way faster. Definitely use `str_replace()` anywhere regex isn't needed – Stevish Jun 28 '19 at 12:46
  • 1
    @Stevish - And yet you're recommending to use regex here, where regex isn't really needed? :-p – M. Eriksson Jun 28 '19 at 12:49
  • @MagnusEriksson, yeah, I wrote and published this before realizing that his situation was so uniform. By that time, Muhammad_Faisal had already given the correct answer, so there was no need to repeat it. I decided not to delete this in case someone finding this page in the future has a problem with more variable input. – Stevish Jun 28 '19 at 18:36
  • That's fair enough. Just looked a bit funny :-) – M. Eriksson Jun 29 '19 at 11:59
1

I think you can use getAttribute('content') instead of plaintext to get the required result.

SIM
  • 21,997
  • 5
  • 37
  • 109
0

I am late Sorry:

<?php

$price = "US $515.00";
$price = str_replace('US $', '', $price);
echo '<div id="vi-mskumap-none" style="" class="u-flL w29 vi-price ">';
echo '<span class="notranslate" id="prcIsum" itemprop="price" style="" content="515.0">'.$price.'</span>';

?>

its working fine

Dupinder Singh
  • 7,175
  • 6
  • 37
  • 61