0

The time in my csv file is divided into 4 columns, (year, julian day, hour/minut(utc) and second), and I wanted to convert to a single column so that it looks like this: 14/11/2017 00:16:00.

Is there a easy way to do this?

A sample of the code is

cols = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
D14 = pd.read_csv(r'C:\Users\William Jacondino\Desktop\DadosTimeSeries\PIRATA-PROFILE\Dados FLUXO\Dados_brutos_copy-20220804T151840Z-002\Dados_brutos_copy\tm_data_2017_11_14_0016.dat', header=None, usecols=cols, names=["Year","Julian day", "Hour/minut (UTC)", "Second", "Bateria (V)", "PTemp (°C)", "Latitude", "Longitude", "Magnectic_Variation (arb)", "Altitude (m)", "Course (º)", "WS", "Nmbr_of_Satellites (arb)", "RAD", "Tar", "UR", "slp",], sep=',')
D14 = D14.loc[:, ["Year","Julian day", "Hour/minut (UTC)", "Second", "Latitude", "Longitude","WS", "RAD", "Tar", "UR", "slp"]]

My array looks like that:

PRINT

The file: csv file sample

The "Hour/minut (UTC)" column has the first two digits referring to the Local Time and the last two digits referring to the minute.

The beginning of the time in the "Hour/minut (UTC)" column starts at 016 which refers to 0 hour UTC and minute 16.

and goes up to hour 12 UTC and minute 03.

I wanted to unify everything into a single datetime column so from the beginning to the end of the array:

1 - 2017

1412 - 14/11/2017 12:03:30

but the column "Hour/minut (UTC)" from hour 0 to hour 9 only has one value like this: 9 instead of 09

How do I create the array with the correct datetime?

  • 1
    Please add a [MRE](https://stackoverflow.com/help/minimal-reproducible-example) (also look [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)), don't use images. – Timus Aug 16 '22 at 08:19

2 Answers2

0

You can create a new column which also adds the data from other columns.

For example, if you have a dataframe like so:

df = pd.DataFrame(dict)

# Print df:

   year month  day  a  b  c
0  2010   jan    1  1  4  7
1  2010   feb    2  2  5  8
2  2020   mar    3  3  6  9

You can add a new column field on the DataFrame, with the values extracted from the Year Month and Date columns.

df['newColumn'] = df.year.astype(str) + '-' + df.month + '-' + df.day.astype(str)

Edit: In your situation instead of using df.month use df['Julian Day'] since the column name is different. To understand more on why this is, read here

The data in the new column will be as string with the way you like to format it. You can also substitute the dash '-' with a slash '/' or however you need to format the outcome. You just need to convert the integers into strings with .astype(str)

Output:

   year month  day  a  b  c   newColumn
0  2010   jan    1  1  4  7  2010-jan-1
1  2010   feb    2  2  5  8  2010-feb-2
2  2020   mar    3  3  6  9  2020-mar-3

After that you can do anything as you would on a dataframe object.

If you only need it for data analysis you can do it with the function .groupBy() which groups the data fields and performs the analysis. source

Osiris
  • 73
  • 1
  • 9
0

If your dataframe looks like

import pandas as pd

df = pd.DataFrame({
    "year": [2017, 2017], "julian day": [318, 318], "hour/minut(utc)": [16, 16],
    "second": [0, 30],
})
   year  julian day  hour/minut(utc)  second
0  2017         318               16       0
1  2017         318               16      30

then you could use pd.to_datetime() and pd.to_timedelta() to do

df["datetime"] = (
    pd.to_datetime(df["year"].astype("str"), format="%Y")
    + pd.to_timedelta(df["julian day"] - 1, unit="days")
    + pd.to_timedelta(df["hour/minut(utc)"], unit="minutes")
    + pd.to_timedelta(df["second"], unit="seconds")
).dt.strftime("%d/%m/%Y %H:%M:%S")

and get

   year  julian day  hour/minut(utc)  second             datetime
0  2017         318               16       0  14/11/2017 00:16:00
1  2017         318               16      30  14/11/2017 00:16:30

The column datetime now contains strings. Remove the .dt.strftime("%d/%m/%Y %H:%M:%S") part at the end, if you want datetimes instead.


Regarding your comment: If I understand correctly, you could try the following:

df["hours_min"] = df["hour/minut(utc)"].astype("str").str.zfill(4)
df["hour"] = df["hours_min"].str[:2].astype("int")
df["minute"] = df["hours_min"].str[2:].astype("int")
df = df.drop(columns=["hours_min", "hour/minut(utc)"])

df["datetime"] = (
    pd.to_datetime(df["year"].astype("str"), format="%Y")
    + pd.to_timedelta(df["julian day"] - 1, unit="days")
    + pd.to_timedelta(df["hour"], unit="hours")
    + pd.to_timedelta(df["minute"], unit="minutes")
    + pd.to_timedelta(df["second"], unit="seconds")
).dt.strftime("%d/%m/%Y %H:%M:%S")

Result for the sample dataframe df

df = pd.DataFrame({
    "year": [2017, 2017, 2018, 2019], "julian day": [318, 318, 10, 50],
    "hour/minut(utc)": [16, 16, 234, 1201], "second": [0, 30, 1, 2],
})
   year  julian day  hour/minut(utc)  second
0  2017         318               16       0
1  2017         318               16      30
2  2018          10              234       1
3  2019          50             1201       2

would be

   year  julian day  second  hour  minute             datetime
0  2017         318       0     0      16  14/11/2017 00:16:00
1  2017         318      30     0      16  14/11/2017 00:16:30
2  2018          10       1     2      34  10/01/2018 02:34:01
3  2019          50       2    12       1  19/02/2019 12:01:02
Timus
  • 10,974
  • 5
  • 14
  • 28
  • Thank you very much! That's exactly what I was trying to do! – William Jacondino Aug 16 '22 at 17:34
  • My "Hour/minut (UTC)" column has the first two strings referring to hour and the other two referring to minute in the same column, do you know how I can separate the two? Hour/minut (UTC) has: 016 where 0 is 0 UTC and 16 is minutes. How can i split this? Thanks in advance! – William Jacondino Aug 16 '22 at 17:50
  • @WilliamJacondino I'm not completely sure about the input, but check my edit, maybe that's what you're looking for? – Timus Aug 16 '22 at 18:13
  • Thanks, Timues! My column "hour/minut(utc)" has two problems: The starting hour 0 through 9 only has one value, and starting at hour 10 it has two values in the column. I was trying to make the script understand that from hour 0 to hour 9 is only one value so that it doesn't take the last two digits that belong to the minute. – William Jacondino Aug 16 '22 at 18:19
  • @WilliamJacondino Sorry, this is really hard to understand without a full set of examples: Could you edit them in your question? My current understanding for column `hour/minut(utc)`: `1` -> 1 minute, `10` -> 10 minutes, `111` -> 1 hour and 11 minutes, `1234` -> 12 hours and 34 minutes. Is that correct? – Timus Aug 16 '22 at 18:25
  • Hi, Timus! I edit my question – William Jacondino Aug 16 '22 at 18:38
  • 1
    I hadn't seen your edit. Now I got it here! Thank you very much, my friend! – William Jacondino Aug 16 '22 at 18:46