I have a dataset(list of list) where each list in the list is a row that represents two columns( date and sample).
I first parse out the year, month, and day to create a new date object to help will gathering all the data for one day.
I then use a dictionary to store all the sampled data daily using the key as date and value as list of samples. I drop values that are None and not greater than 0.
I then removed any duplicate value out of each daily list, I couldn't figure out how to do that using the first for loop so I just used dictionary comprehension. If someone can show me how to check for None values, values > 0, and duplicates
using a single for loop that would be great? I would think the filter function would help finding None, values < 0, and duplicates
? I would still like to preserve my order as is but when I call list(set()) is shuffles the order for some reason?
import datetime
import itertools
from collections import defaultdict
from dateutil.parser import parse
from dateutil.tz import gettz
ds = [["Wed Feb 02 22:51:17 CST 2022", 9607377.0],
["Wed Feb 02 23:21:17 CST 2022", 9607507.0],
["Wed Feb 02 23:51:17 CST 2022", 9607637.0],
["Thu Feb 03 00:21:17 CST 2022", 9607766.0],
["Thu Feb 03 00:51:17 CST 2022", 9607896.0],
["Thu Feb 03 01:21:17 CST 2022", 9608026.0],
["Thu Feb 03 01:51:17 CST 2022", 9608158.0],
["Thu Feb 03 02:21:17 CST 2022", 9608289.0],
["Thu Feb 03 02:51:17 CST 2022", 9608421.0],
["Thu Feb 06 10:21:18 CST 2022", 0.0],
["Thu Feb 03 03:21:17 CST 2022", 9608556.0],
["Thu Feb 03 03:51:17 CST 2022", 9608691.0],
["Thu Feb 04 04:21:17 CST 2022", 9608822.0],
["Thu Feb 04 04:51:17 CST 2022", 9608956.0],
["Thu Feb 04 05:21:18 CST 2022", 9609092.0],
["Thu Feb 04 05:51:18 CST 2022", 9609228.0],
["Thu Feb 05 06:21:18 CST 2022", 9609363.0],
["Thu Feb 05 06:21:18 CST 2022", 9609363.0],
["Thu Feb 05 06:51:18 CST 2022", 9609504.0],
["Thu Feb 05 07:21:18 CST 2022", 9609645.0],
["Thu Feb 05 07:51:18 CST 2022", 9609787.0],
["Thu Feb 05 08:21:18 CST 2022", 9609925.0],
["Thu Feb 05 08:51:18 CST 2022", 9610068.0],
["Thu Feb 06 09:51:18 CST 2022", 9610358.0],
["Thu Feb 06 10:21:18 CST 2022", 9610503.0],
["Thu Feb 06 10:21:18 CST 2022", None],
["Thu Feb 06 10:51:18 CST 2022", 9610646.0]]
tz_dict = {"CST": gettz('America/Chicago')}
time_delta = datetime.timedelta(days=1)
dict1 = {}
for col in ds:
date = parse(col[0], tzinfos=tz_dict)
new_date = datetime.datetime(date.year, date.month, date.day)
if col[1] is not None and col[1] > 0:
dict1.setdefault(new_date, []).append(col[1])
dict1 = {k: list(set(v)) for k, v in dict1.items()}