I am attempting to complete the Google Data Analytics Certification Capstone Project regarding Cyclistic Bike Share data. I downloaded the 2022 monthly data from here: https://divvy-tripdata.s3.amazonaws.com/index.html.
There are two datetime columns, started_at and ended_at, that contain the time a client rented and returned the bike. Using these two columns and MySQL, I wanted to get the difference of those two to get a trip_length column.
Using DBeaver and Visual Studio Code, I am able to correctly display the data ordered by trip length in descending order:
SELECT
started_at,
ended_at,
TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM
bikes.processed
ORDER BY
3 DESC
LIMIT 1000;
However, every time I want to flip the order of the results (in ascending order), both DBeaver and Visual Studio Code show all data as NULLs. I went over the data cleanup process and made sure to completely delete all tuples containing null variables.
I found a workaround, which is using the top answer here: MySQL Orderby a number, Nulls last
SELECT
started_at,
ended_at,
TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM
bikes.processed
ORDER BY ISNULL(trip_length), trip_length ASC
LIMIT 10;
I am a bit uneasy about both systems showing nulls and me having to push the null results to the end. Why did this happen? Can I trust my analysis if some random nulls occurred? Thank you so much!
The following answers showed all null answers:
SELECT
started_at,
ended_at,
TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM
bikes.processed
ORDER BY 3 ASC
LIMIT 10;
SELECT
started_at,
ended_at,
TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM
bikes.processed
ORDER BY ISNULL(trip_length), trip_length ASC
LIMIT 10;
I expected to find at least find the started_at and ended_at columns populated, but for some reason both systems only show columns with NULL inside them.