0

Here is my problem.

I have a dataframe imported from a .xlsx file. It contains one column with dates but the problem is it is not presented as datetime.

For instance, in the first line, it is the date (format: DD//MM/YYYY (type str) and the 24 following lines are the hours (format: xh (h for hour, x is between 0 and 23)). This is repeated for three years.

I would like to transform this column so as the cells will be datetime in format YY-MM-DD HH:MM:SS.

First of all, I created a dataframe df2 containing the hours:

indexNames = df1[df1['Hour'].str.contains('/')].index
df2= df1.drop(indexNames)

I transformed it to get it as datetime format HH:MM

# Conserving the number
new = df2["Hour"].str.split("h", n = 1, expand = True) 
new["new_Hour"]= new[0]

# Dropping old Name columns
df2.drop(columns = ["Hour"], inplace = True)

# Transforming in datetime format
df2['new_Hour'] = pd.to_datetime(df2['new_Hour'], format="%H")
df2['new_Hour'] = df2['new_Hour'].astype(str)
nouv = df2['new_Hour'].str.split(' ', n=1, expand = True)
df2["Hour"]= nouv[1]
df2.drop(columns = ["new_Hour"], inplace = True)

Then, I created a second dataframe having the date and added separated columns for corresponding year, month and day:

df3= df1.loc[df1['Hour'].str.contains('/')].copy()
df3['Hour'] = pd.to_datetime(df3['Hour'], format="%d/%m/%Y")
df3['year'] = df3['Hour'].dt.year
df3['month'] = df3['Hour'].dt.month
df3['day'] = df3['Hour'].dt.day

Here comes my problem,

df3 indexes are strating at 0 and taking +25 at each line. It means df3.index[0] = 0, df3.index[1] = 25, df3.index[2] = 50 etc

df2 indexes are starting at 1 and more genarally, indexes of df3 are missing.

I would like to add the corresponding date of df3 to the corresponding hours of df2.

After having reseted indexes of ddf2 and df3, I tried:

df4 = df2.copy()
df4['year'] = 2019
df4= df4.reset_index(drop = True)
for i in range(len(df3)-1):
    df4['year'].iloc[df3.index[i]:df3.index[i+1]] = df3['year'][i]

But I get copy problems and probably indexes problems too.

Hope you could help me, thanks.

Jorisltc
  • 15
  • 7

1 Answers1

1

you might want to start out with a cleaner way to create a datetime column? e.g. like

import pandas as pd

# dummy sample...
df = pd.DataFrame({'Hour': ["10/12/2013", "0", "1", "3",
                            "11/12/2013", "0", "1", "3"]})

# make a date column, forward-fill the dates
df['Datetime'] =  pd.to_datetime(df['Hour'], format="%d/%m/%Y", errors='coerce').fillna(method="ffill")

# now we can add the hour
df['Datetime'] = df['Datetime'] + pd.to_timedelta(pd.to_numeric(df['Hour'], errors='coerce'), unit='h')

# and optionally drop nans in the Datetime column, i.e. where we had dates initially
df = df[df["Datetime"].notna()].reset_index(drop=True)

df
  Hour            Datetime
0    0 2013-12-10 00:00:00
1    1 2013-12-10 01:00:00
2    3 2013-12-10 03:00:00
3    0 2013-12-11 00:00:00
4    1 2013-12-11 01:00:00
5    3 2013-12-11 03:00:00
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thank you so much, you are a genius ! It works ! Just to fully understand, what does the method "coerse" do ? Same for .fillna ? Thank you so much ! – Jorisltc Jun 20 '22 at 12:19
  • I have a more global question, may i ask you ? How can I create a new column to a dataframe but choosing where I put my values ? For instance, I always try to do df1['new column'].loc[df1['older column] == condition] = df2['column'] But this raises errors concerning slice of copy – Jorisltc Jun 20 '22 at 12:26
  • 1
    @Jorisltc `errors='coerce'` is useful if you want NaN (or NaT) for values that cannot be converted, as the hours in this case. `fillna` replaces NaN values; here you can use it to fill in the valid dates using the forward-fill method. – FObersteiner Jun 20 '22 at 12:30
  • @Jorisltc concerning your second comment, do you mean [this error](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas)? – FObersteiner Jun 20 '22 at 12:33
  • Thank you, really ingenious, I didn't know those methods ! – Jorisltc Jun 20 '22 at 12:34
  • Yes, such as the link you posted. Also when I want to drop columns for instance, this error occurs. So if I understand I can choose to disable this warning ? – Jorisltc Jun 20 '22 at 12:37
  • @Jorisltc hm I think you could try `df1.loc[df1['older column] == condition, 'new column'] = df2['column']` – FObersteiner Jun 20 '22 at 12:42
  • Ok, I will try this next time I deal with this problem Thank you for your answers and all your explanations, it was really helpful – Jorisltc Jun 20 '22 at 12:49