0

I am attempting to complete the Google Data Analytics Certification Capstone Project regarding Cyclistic Bike Share data. I downloaded the 2022 monthly data from here: https://divvy-tripdata.s3.amazonaws.com/index.html.

There are two datetime columns, started_at and ended_at, that contain the time a client rented and returned the bike. Using these two columns and MySQL, I wanted to get the difference of those two to get a trip_length column.

Using DBeaver and Visual Studio Code, I am able to correctly display the data ordered by trip length in descending order:

SELECT 
    started_at,
    ended_at,
    TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM 
    bikes.processed
ORDER BY 
3 DESC
LIMIT 1000;

However, every time I want to flip the order of the results (in ascending order), both DBeaver and Visual Studio Code show all data as NULLs. I went over the data cleanup process and made sure to completely delete all tuples containing null variables.

I found a workaround, which is using the top answer here: MySQL Orderby a number, Nulls last

SELECT 
    started_at,
    ended_at,
    TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM 
    bikes.processed
ORDER BY ISNULL(trip_length), trip_length ASC
LIMIT 10;

I am a bit uneasy about both systems showing nulls and me having to push the null results to the end. Why did this happen? Can I trust my analysis if some random nulls occurred? Thank you so much!

The following answers showed all null answers:

SELECT 
    started_at,
    ended_at,
    TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM 
    bikes.processed
ORDER BY 3 ASC
LIMIT 10;
SELECT 
    started_at,
    ended_at,
    TIMESTAMPDIFF(MINUTE, started_at, ended_at) AS trip_length
FROM 
    bikes.processed
ORDER BY ISNULL(trip_length), trip_length ASC
LIMIT 10;

I expected to find at least find the started_at and ended_at columns populated, but for some reason both systems only show columns with NULL inside them.

  • You should cater for nulls and invalid dates in your code since datediff will return null if either is true. – P.Salmon Mar 21 '23 at 07:51
  • Hi P. I thought I did a pretty good job of erasing nulls, so I don't know where they are coming from. Does this happen in the real world? I am doing a career switch, so I have anyone near me to bounce ideas with. Is it okay to continue my analysis even though this issue with nulls occurred? Is there a way to get rid of those shadow NULL values? – Kenny Smith Mar 21 '23 at 20:25

0 Answers0