1

tldr; How do I convert DateTimeIndex back to a column in a dataframe?

The long explanation:

I have accelerometer and loadcell data for the same time period, but the timestamps for each device are in different formats and timezones. My current theory is that I need them to be in the same format so that I will be able to merge both datasets into one dataframe.

My first step was to expedite this process as much as possible by reading in the data correctly. To that end, I combined the date and time data for the Load Data and renamed that column to "Time" so that it would be the same as the Accelerometer data.

import pandas as pd
import datetime as dt

path = r"C:\Users\afhal\Dropbox\Arboriculture\ThesisData\Kane_Shared_Folder\2020Summer"
load_path = path + "\\" + r"ulmus15\Reference\ulmus15loads.csv"
pull_path = path + "\\"+ r"ulmus15\Reference\ulmus15pulls.csv"

df_pull = pd.read_csv(pull_path, skiprows =22, parse_dates = ["Time"])
print(df_pull)
df_load = pd.read_csv(load_path, parse_dates = [["Date","Time"]])
df_load = df_load.rename(columns = {"Date_Time": "Time", "Elapsed mS": "Elapsed S", "Climber": "Load"})
print(df_load)

Results for Accelerometer Data

              Time  12159:ch4  12159:ch5  12160:ch4  12160:ch5 <br>

0 2020-07-25 13:20:46 52.56000 -27.23 73.34 8.86
1 2020-07-25 13:20:47 52.18000 -27.04 73.02 8.99
2 2020-07-25 13:20:48 52.18000 -27.03 73.02 8.98
3 2020-07-25 13:20:49 52.35000 -27.47 73.04 8.97
4 2020-07-25 13:20:50 52.31000 -26.84 73.03 8.98
... ... ... ... ... ...
9266 2020-07-26 11:22:49 81.07000 0.19 80.64 -0.32
9267 2020-07-26 11:22:50 81.75999 -0.08 80.63 -0.32
9268 2020-07-26 11:22:51 81.06000 0.18 80.63 -0.32
9269 2020-07-26 11:22:52 81.77000 -0.08 80.64 -0.31
9270 2020-07-26 11:22:53 NaN NaN 80.64 -0.31

[9271 rows x 5 columns]

Results for Load Cell Data

    Time  Elapsed S  Load

0 2020-07-25 09:26:28 1002 2
1 2020-07-25 09:26:29 2011 2
2 2020-07-25 09:26:30 3035 3
3 2020-07-25 09:26:31 4037 2
4 2020-07-25 09:26:32 5060 2
... ... ... ...
1302 2020-07-25 09:48:24 1317109 -2
1303 2020-07-25 09:48:25 1318135 -2
1304 2020-07-25 09:48:26 1319165 -1
1305 2020-07-25 09:48:27 1320177 -2
1306 2020-07-25 09:48:28 1321178 -1

[1307 rows x 3 columns]

I needed to convert the Accelerometer Data to US/Eastern. My understanding from reading other StackOverflow questions is that I need to change my time column from a pandas DateTime to a pandas DateTimeIndex. I accomplished this with the following code:

df_test = df_pull.set_index('Time').index.astype('datetime64[ns]')
df_test.tz_localize('UTC').tz_convert('US/Eastern')

And this has put me thoroughly in the weeds.

  • Problem 1: I seem to have destroyed my other columns in the process of creating a DateTimeIndex--how do I get them back?
  • Problem 2: A DateTimeIndex is not the same as a DateTime object, and I do not think I can merge them unless I make the DateTimeIndex back into a DateTime object.
AHalperin
  • 49
  • 1
  • 9
  • Please revise this to just *one* question. Someone can surely help you in that case - this is a bit long – anon01 Dec 05 '20 at 22:18
  • Hope that's a bit better? – AHalperin Dec 05 '20 at 22:43
  • its better. what output are you expecting exactly? Can you be more specific about `but I need to correlate the data in one file`? You can still break this up into separate questions. One of them sounds like timezone issues, there is another about joining the data – anon01 Dec 05 '20 at 22:52
  • Thank you for your patience and feedback. Edited again. – AHalperin Dec 05 '20 at 22:59

1 Answers1

1

After a great deal of fussing around the Internet, I solved my problem!

# CREATE a DateTimeIndex
pull_index = df_pull.set_index('Time').index.astype('datetime64[ns]')

# Localize time with tz
pull_index = pull_index.tz_localize('UTC').tz_convert('US/Eastern')

# Back to an naive datetimeindex! https://stackoverflow.com/questions/36292959/pandas-merge-data-frames-on-datetime-index
pull_index = pull_index.tz_localize(None)
print(pull_index)

# Create a datetimeindex for the other dataset
load_index = df_load.set_index('Time').index.astype('datetime64[ns]')
print(load_index)

# Add datetimeindex to datasets and delete extra columns
df_pull=df_pull.set_index(pull_index)
df_pull.drop("Time",axis=1,inplace=True)
print(df_pull)

df_load=df_load.set_index(load_index)
df_load.drop("Time",axis=1,inplace=True)
print(df_load)

# Merge by datetimeindex
merge=pd.merge(df_pull,df_load, how='inner', left_index=True, right_index=True)
AHalperin
  • 49
  • 1
  • 9