1

I want a column that tells me the time elapsed since he first row (5/1/2002 at 6:00AM), to the last (11/20/2006 at 2:00PM). How can I create an extra column that tells me the hours passed since the beginning of 5/1/2002? Here is my dataframe:

         Date  Time (HHMM)         Site  Plot  Replicate  Temperature  \
1      5/1/2002          600  Barre Woods    21          7          4.5
2      5/1/2002          600  Barre Woods    31          9          6.5
3      5/1/2002          600  Barre Woods    10          2          5.3
4      5/1/2002          600  Barre Woods     2          1          4.0
5      5/1/2002          600  Barre Woods    13          4          5.5
6      5/1/2002          600  Barre Woods    11          3          5.0
7      5/1/2002          600  Barre Woods    28          8          5.0
8      5/1/2002          600  Barre Woods    18          6          4.5
9      5/1/2002         1400  Barre Woods     2          1         10.3
10     5/1/2002         1400  Barre Woods    31          9          9.0
11     5/1/2002         1400  Barre Woods    13          4         11.0
12     5/1/2002         1400  Barre Woods    18          6          6.5
13     5/1/2002         1400  Barre Woods    11          3         10.3
14     5/1/2002         1400  Barre Woods    10          2         10.5
15     5/1/2002         1400  Barre Woods    28          8         10.3
16     5/1/2002         1400  Barre Woods    21          7         10.5
17     5/1/2002         1400  Barre Woods    16          5         10.3
18    5/22/2002          600  Barre Woods    13          4          6.5
19    5/22/2002          600  Barre Woods    18          6          5.8
20    5/22/2002          600  Barre Woods     2          1          5.5
...          ...          ...          ...   ...        ...          ...
710  11/20/2006         1400  Barre Woods    31          9          7.4
  • 1
    Do something like this http://stackoverflow.com/questions/22923775/calculate-pandas-dataframe-time-difference-between-two-columns-in-hours-and-minu ... and why would you put your question in bold? – Overclover Jul 16 '16 at 09:48

1 Answers1

1

Simple:

  • read the file,
  • parse the date and time,
  • calculate the delta with the first date/time,
  • write the result.

Here is an implementation using file-like objects for the demo:

import datetime
import io

data = """\
         Date  Time (HHMM)         Site  Plot  Replicate  Temperature
1      5/1/2002          600  Barre Woods    21          7          4.5
2      5/1/2002          600  Barre Woods    31          9          6.5
3      5/1/2002          600  Barre Woods    10          2          5.3
710  11/20/2006         1400  Barre Woods    31          9          7.4
"""

date_fmt = "%m/%d/%Y %H%M"
fisrt_date = None

with io.StringIO(data) as src_file, io.StringIO(data) as dst_file:

    # copy the header
    dst_file.write(next(src_file))

    for line in src_file:
        parts = line.strip().split()
        if not parts:
            dst_file.write(line)
            continue

        timestamp = parts[1] + " " + parts[2]
        curr_date = datetime.datetime.strptime(timestamp, date_fmt)
        if fisrt_date is None:
            fisrt_date = curr_date
        delta = curr_date - fisrt_date
        dst_file.write(line.strip() + "  " + str(delta) + "\n")

    print(dst_file.getvalue())

You get:

       Date  Time (HHMM)         Site  Plot  Replicate  Temperature
1      5/1/2002          600  Barre Woods    21          7          4.5  0:00:00
2      5/1/2002          600  Barre Woods    31          9          6.5  0:00:00
3      5/1/2002          600  Barre Woods    10          2          5.3  0:00:00
710  11/20/2006         1400  Barre Woods    31          9          7.4  1664 days, 8:00:00

Of course, you can change the format of the delta values.

Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
  • I converted the CSV file into strings format and I am getting an error at the timestamp process, it is telling me: "IndexError: list index out of range" – Sergio Espejo Jul 17 '16 at 08:38
  • @Sergio Espejo: post the code sample and related stack trace to debug. By the way, if you really have a CSV file, consider using the [cvs](https://docs.python.org/2/library/csv.html) Python module. – Laurent LAPORTE Jul 17 '16 at 09:11
  • This is what my string looks like: '{"0":"Date","1":"Time (HHMM)","2":"CO2-Rh"}' I keep getting the error: "IndexError: list index out of range" Even after I ran pdb.run('mymodule.test()') – Sergio Espejo Jul 18 '16 at 04:47