1

Context: I'm retrieving stock data from the MySQL database (such as Date, Open, Low, Close and High prices) and I have some duplicates on the table. To avoid this, I'm trying to use the DISTINCT in my query, however, I keep getting some duplicates on the Date. I believe this is happening because while some of the Date is the same, the Open, High, Low, and Close prices are not, therefore I keep getting duplicate dates regardless.

I've tried using UNION to get the distinct dates and then use 2nd select on my query to select the open, high, low, and close prices, but no success:

(SELECT DISTINCT DATE) 
UNION 
(SELECT Open, High, Low, Close 
FROM symbolsv2 
WHERE ticker='AAPL' 
AND `intval`='1d' 
AND Open 
AND High 
AND Low 
AND Close IS NOT NULL 
ORDER BY DATE)

This is what is happening with some of the data I have on my MySQL: Results

As you can see, the Dates are the same, but the prices are different (this is related to the web scraping that I'm doing that can glitch from time to time and retrieve the prices for the same date, but at different periods of the day).

This is my original SQL query:

if (isset($_POST['submit'])) {
    $ticker = $_POST['ticker'];
    $intval = $_POST['intval'];
    $Date = $_POST['date'];

    $stmt = $conn->prepare("SELECT DISTINCT Date, Open, High, Low, Close 
                            FROM symbolsv2 
                            WHERE ticker=? 
                            AND `intval`=? 
                            AND Date>=? 
                            AND Date 
                            AND Open 
                            AND High 
                            AND Low 
                            AND Close IS NOT NULL 
                            ORDER BY Date");
    $stmt->bind_param("sss", $ticker, $intval, $Date);
    $stmt->execute();

    $result = $stmt->get_result();
    while ($data = $result->fetch_assoc()) {
        $data_array[] = $data;
    }
    $stmt->close();
}
$conn->close();
?>

How could I do it so that I no longer have this dulpicates from Date? I don't keeping the 1st record for example (i.e. keep the 76th index and remove the 77th, for example).

I found this SO - SQL: Two select statements in one query, but I couldn't get it to work

Thank you in advance, any help is welcome (I'm fairly new to SQL)

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
  • I was trying to do the `UNION` implementation, but I keep getting syntax error `Unknown Date in field list` Edit: This was in response to a user questioning why I couldn't get it to work however it seems the comment has been deleted –  Nov 02 '21 at 12:18
  • *I'm retrieving stock data from MySQL database (such as Date, Open, Low, Close and High prices) and I have some duplicates on the table* DISTINCT removes only completely duplicated rows. If two rows differs in at least one byte in one columns then these rows are not duplicates. Use GROUP BY. – Akina Nov 02 '21 at 12:20
  • `DISTINCT` is supposed to give you unique combinations. If you have two columns, first name and last name, with Bob Smith and Bob Johnson, you’ll get two rows. If you want only one Bob, you need to come up with rules for which is the winner, and which are the losers. – Chris Haas Nov 02 '21 at 12:21
  • Exactly. DISTINCT is for *all* the columns in the SELECT, not just Date. – Torbjörn Stabo Nov 02 '21 at 12:21
  • Right, that's what I suspected as well. Is there a way to somehow remove one of the Duplicates from `Date` even though the close, open, high and low prices could be slightly different? –  Nov 02 '21 at 12:23
  • _This is what is happening with some of the data I have on my MySQL:_ Seems like that is what you should be fixing then – RiggsFolly Nov 02 '21 at 12:25
  • 2
    @ROO You could perhaps use `GROUP BY Date`? **But** then you should look into what you get in the other columns then(which row is "chosen") if you have multiple rows with the same Date value. A bit like Chris Haas said. Edit: Perhaps you can use `ORDER BY some_suitable_column(s)` for that, together with grouping. – Torbjörn Stabo Nov 02 '21 at 12:26
  • @TorbjörnStabo Thank you, `GROUP BY Date` fixed the problem. I am now getting the 1st `Date`. Thank you all for your help! –  Nov 02 '21 at 12:35

1 Answers1

0

I guess you want to consolidate any duplicate records you have. You can use this GROUP BY query.

SELECT Date, ticker, intval,
       AVG(Open) AS Open, 
       MAX(High) AS High,
       MIN(Low)  AS Low,
       AVG(Close) AS Close 
  FROM symbolsv2 
 WHERE ticker=? 
   AND `intval`=?
   AND Date >= ?
 GROUP BY Date, ticker, intval;

This will yield one row for each distinct combination of Date, ticker, and intval. It handles duplicate rows possibly with different data by averaging the multiple open and close prices, and taking the highest of the high and lowest of the low prices. (How you want to handle these duplicates with different values is up to you, of course. I just guessed.)

(UNION doesn't work the way you think it does. It concatenates two result sets.)

O. Jones
  • 103,626
  • 17
  • 118
  • 172