1

Beginner question here.

What I'm trying to build: A program that takes data from a CSV and creates a calendar heat map from it. I am a language learner (language as in spanish, japanese, etc) and the data set I'm using is a CSV that shows how many hours I spent immersing in my target language per day. I want the individual values in the heat map to be the number of hours. Y axis will be days of the week, and x axis will be months.

What I have tried: I have tried many methods for the past two days (most of them using seaborn), that have all resulted in error-infested spaghetti code...

The method I'm using today is with calmap. Here is what I have so far:

import seaborn as sns
import matplotlib as plt
import numpy as np
from vega_datasets import data as vds
import calmap
import pandas as pd
import calplot


# importing CSV from google drive
df = pd.read_csv('ImmersionHours.csv', names=['Type', 'Name', 'Date', 'Time', 'Total Time'])

# deleting extraneous row of data
df.drop([0], inplace=True)

# making sure dates are in datetime format
df['Date'] = pd.to_datetime(df['Date'])

# setting the dates as the index
df.set_index('Date', inplace=True)


# the data is now formatted how I want


# creating a series for the heat map values
hm_values = pd.Series(df.Time)

# trying to create the heat map from the series (hm_values)
calmap.yearplot(data=hm_values, year=2021)

and here is a copy of the data set that I imported into Python (for reference) https://docs.google.com/spreadsheets/d/1owZv0NDLz7S4R5Spf-hzRDGMTCS1FVSMvi0WsZJenWE/edit?usp=sharing

Can someone tell me where I'm going wrong and why the heat map won't show? Thank you in advance for any advice/tips/corrections.

  • Does `df['Time']` contain the values you want to show? Are they numeric? Then `calmap.yearplot(df['Time'], year=2021)`. Probably the biggest problems is how you call `read_csv` and then remove the first row. You really need `read_csv` to directly skip that row. Otherwise, pandas can not set the correct types for each of the columns. You can try `df.info(verbose=True)` to check the type of all columns (dtype `object` for columns that are not meant to be strings hints at a problem). – JohanC May 14 '21 at 18:51

1 Answers1

0

The question is a bit old, but in case anyone is interested, I had the same problem and found that this notebook was very helpful to solve the issue: https://github.com/amandasolis/Fitbit/blob/master/FitbitSummaryPlots.ipynb

import numpy as np
import pandas as pd
import calmap

fulldf = pd.read_csv("./data.csv", index_col=0, header=None,names=['date','duration','frac'], parse_dates=['date'], usecols=['date','frac'], infer_datetime_format=True, dayfirst=True)
fulldf.index=pd.to_datetime(fulldf.index)
events = pd.Series(fulldf['frac'])
calmap.yearplot(events, year=2022) #the notebook linked above has a better but complex viz

first lines of data.csv (I plot frac, the 3rd column, not duration, but it should be similar):

03/11/2022,1,"0.0103"
08/11/2022,1,"0.0103"
15/11/2022,1,"0.0103"
rfs
  • 81
  • 5