Get elements from numpy array only matching datetime

Question

I have a list of tidal height data with a reading every 10 minutes for 1 year that I've loaded in to a list from csv.

The end result I'm trying to achieve is to be able to (line or bar)graph tides and end up with something like this: https://nt.gov.au/__data/assets/pdf_file/0020/430238/2018-mar-tidal-info-ntports-centre-island-graph.pdf

I'm pretty new to programming and have set myself the smaller task of creating a tidal graph from height data for a given day. I would then output multiple graphs to make up a week etc.

 # import numpy as np
 # from datetime import datetime
        DATA:
            010120170010  1.700        
            010120170020  1.650    

    for line in csv_reader:    
        data_times.append(datetime.strptime(line[0], "%d%m%Y%H%M"))
        data_height.append(float(line[2]))

    np_data_times = np.array(data_times)
    np_data_height = np.array(data_height)

create array only with today's heights Is there a better way that does the python equivalent of the SQL 'select * from times where date = today()'? Can I create a dictionary with time: height rather than 2 arrays? (I've read that dicts are unordered so stayed away from that approach)

Plot array divided every 6 hours I'd also like to provide data points to the chart but only show times divided every 3 or 6 hours across the X axis. This would give a smoother and more accurate graph. So far I've only found out how to give data to the x axis and it's labels in a 1:1 fashion when I may want 6:1 or 18:1 etc. Is there a particular method I should be looking at?

# import matplotlib.pyplot as plt
plt.title("Tides for today")
plt.xlabel(datetime.date(real_times[0]))
plt.ylabel("Tide Height")
plt.plot(real_times, real_heights)
plt.show()

I do not understand what "6:1" or "18:1" or "divided every 3 or 6 hours across the X axis" means. Please simply state how the resulting graph should look like. — ImportanceOfBeingErnest, Jun 23 '17 at 10:36
It's a difficult one to explain. Thinking about it further I do want to plot every data point but I want to show minor/major divisions along the X axis of the plot. so if I wanted to divide by the hour I would have 6 data points for every 1 division along the x axis. — Arvo, Jun 24 '17 at 00:33
Could it be that you just want to show gridlines for every hour in the plot? — ImportanceOfBeingErnest, Jun 24 '17 at 01:09
Yes that's exactly what I'm trying to do. The problem I'm running into with matplotlib is that I don't know how to control labeling along the x axis. I've searched through pyplot.ax() and pyplot.plot but nothing seems to work. am I missing a particular method? — Arvo, Jun 24 '17 at 06:46

ImportanceOfBeingErnest · Accepted Answer · 2017-06-24T07:19:03.267

Don't use a dictionary. This would make everything slow and hard to handle.
I would suggest to consider using pandas.

Reading in the data would work like this:

import pandas as pd
from datetime import datetime

conv={"Time" : lambda t: datetime.strptime(t, "%d%m%Y%H%M")}
df = pd.read_csv("datafile.txt", header=None, delim_whitespace=True, 
                 names=["Time", "Tide"], converters=conv,index_col=0 )

This results in something like

                     Tide
Time                     
2017-01-01 00:10:00  1.70
2017-01-01 00:20:00  1.65
2017-01-01 05:20:00  1.35
2017-01-02 00:20:00  1.75

You can now filter the dataframe, e.g. for selecing only the data from the first of january:

df["2017-01-01":"2017-01-01"]

You could directly plot the data like

df["2017-01-01":"2017-01-01"].plot(kind="bar")

or

df["2017-01-01 00:00":"2017-01-01 06:00"].plot(kind="bar")

This will work nicely if the times are equally spaced because it creates a categorical bar plot. (just remember that you might need to use pyplot.show() if working in a script)

You may also use matplotlib to draw the bars

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates

df1 = df["2017-01-01 00:00":"2017-01-01 06:00"]
plt.bar(df1.index,df1.Tide, 
        width=np.diff(matplotlib.dates.date2num(df1.index.to_pydatetime()))[0], ec="k")
plt.show()

To get control over the xaxis ticks and labels this latter matplotlib solution would be the way to go. First set the bars to align to the edges, align="edge". Then use formatters and locators as shown in the official dates example. A grid can be defined using plt.grid.

plt.bar(df1.index,df1.Tide, 
        width=np.diff(matplotlib.dates.date2num(df1.index.to_pydatetime()))[0], 
        align="edge", ec="k")

hours = matplotlib.dates.HourLocator()   # every hour
#hours = matplotlib.dates.HourLocator(byhour=range(24)[::3]) # every 3 hours
fmthours=matplotlib.dates.DateFormatter("%m-%d %H:%M")
plt.gca().xaxis.set_major_locator(hours)
plt.gca().xaxis.set_major_formatter(fmthours)

plt.grid(True, axis="x", linewidth=1, color="k")
plt.gcf().autofmt_xdate()
plt.show()

This may then look something like the following:

Thank you so much for your help, filtering the dataframe explains things perfectly. — Arvo, Jun 24 '17 at 06:47
Thanks again. I was messing around with URL rather than a file location and was getting SSL verify errors(windows 7 python 3.6 x32). worked with google so I'm assuming he URL I'm trying to access isn't in the python openssl CA list. Manually installing the cert fixed things anyway. url in question is: https://www.tmr.qld.gov.au/-/media/aboutus/corpinfo/Open%20data/predictiveintervaldata/Brisbane-Bar/P046046A_2017.csv — Arvo, Jun 24 '17 at 08:47

Get elements from numpy array only matching datetime

1 Answers1