0

I have a chatbot that will ask questions and receive answers in a key-value format. The reason I choose a key-value pair is that it allows me to add new questions to the bot without having to migrate to the DB schema every time I make a change.

For example, a sample data set may be:

(e.g. - key, value) 1. temperature, 72 2. heart_rate, 65 3. sleep_duration, 8

I want to answer questions like what's heart_rate based on the amount of sleep that day and see those values over time. Or just easy sums, like what is the total amount of sleep per week compared to the average heart rate.

I started exploring pandas and I want to create a data frame, where each of these key types could be their own column, and the value in the column would be the corresponding value from the pair. Each of these entries shares a common attribute of the timestamp from when it was recorded.

How would I go about this? I originally was doing something like

#read in CSV and initialize an empty new data frame
input_data = pd.read_csv('raw_data.csv')
temperature_df = pd.DataFrame

#create a data frame which keeps the timestamp and only the value of the column 
temperature_df = input_data.loc[input_data['key'] == 'temperature']['value']

However, in the example above, it is still missing the ability to pull in multiple columns into this new data frame like a timestamp column. On top of that, I'd end up with multiple new data frames, and I'd have to merge them if I wanted to plot them and I'm not sure how to do that as well if some tables may have multiple rows for the same timestamp while others may not.

Here's an example of the input and desired output for comparison:

id  timestamp       key             value
1   1576086899000   temperature     70
2   1576086899000   sleep           8
3   1576086899000   heartrate       65
4   1576086876000   temperature     72
5   1576086876000   sleep           7.5
6   1576086876000   heartrate       62
7   1576086866000   temperature     74
8   1576086866000   sleep           7.8
9   1576086866000   heartrate       64


Ideally, I can get to:

timestamp           temperature     sleep   heartrate
1576086899000       70              8       65      
1576086876000       72              7.5     62
1576086866000       74              7.5     64

I understand there's a lot here, so thank you for the help!

0 Answers0