0

I have a txt file with data and values like this one:

PP  C   timestamp   HR  RMSSD   SCL 
PP1 1   20120918T131600000  NaN NaN 80.239727 
PP1 1   20120918T131700000  61  0.061420    77.365127

and I am importing it like that:

df = pd.read_csv('data.txt','\t', header=0)

which gives me a nice looking dataframe: enter image description here

Running

df.columns

shows this result Index(['PP', 'C', 'timestamp', 'HR', 'RMSSD', 'SCL'], dtype='object').

Now when I am trying to convert the timestamp column into a datetime column:

df["datetime"] = pd.to_datetime(df["timestamp"], format='%Y%m%dT%H%M%S%f')

I get this: ValueError: time data 'timestamp' does not match format '%Y%m%dT%H%M%S%f' (match)

Any ideas would be appreciated.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Zoi K.
  • 361
  • 2
  • 3
  • 11
  • 2
    Somehow the `timestamp` heading has gotten into a data row. – Barmar May 31 '22 at 20:07
  • I cannot replicate the error based on your sample data: `pd.to_datetime(df['timestamp'], format='%Y%m%dT%H%M%S%f')` works as expected. Does your dataset have data not in the following format: `'%Y%m%dT%H%M%S%f'`? – It_is_Chris May 31 '22 at 20:07
  • I don't see any seconds in your timestamp, it looks like hour, minute, microsecond to me. If you delete %S it would work – Chris May 31 '22 at 20:09
  • @Chris your suggestion gave me the value error `unconverted data remains: 000` – Zoi K. May 31 '22 at 20:11
  • @Barmar what do you mean? how could I fix this? – Zoi K. May 31 '22 at 20:12
  • @It_is_Chris at least the 40 first values in my dataset all of them have the "T" inside of it, and looks like that format. – Zoi K. May 31 '22 at 20:12
  • 2
    You need to fix how you're loading the data into the dataframe in the first place. You haven't shown that code, so we can't tell what you're doing wrong. – Barmar May 31 '22 at 20:14
  • 1
    @Chris It looks like 2 digits of seconds and 3 digits of milliseconds. – Barmar May 31 '22 at 20:14
  • I also can't reproduce the issue. You need to make a [mre] including example input. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea May 31 '22 at 20:35
  • @wjandrea I edited my question. Please take a look. – Zoi K. May 31 '22 at 20:47
  • @ZoiK I still can't reproduce the issue. Firstly, SO converts tabs to spaces, but once I correct for that, the `.to_datetime` conversion works fine. – wjandrea May 31 '22 at 20:53
  • @wjandrea so having a fraction of what the exact file I am using looks like, is working fine for you? How is that possible since I gave you everything I have? What am I doing wrong? What is "SO"? – Zoi K. May 31 '22 at 20:55
  • 1
    @ZoiK Sorry, "SO" is Stack Overflow, and specifically I mean the Markdown renderer. What you should be doing is copy-pasting the code and data from your question into a fresh environment and testing it to make sure that it reproduces the issue. Based on what I can see here, it seems like the literal string `'timestamp'` has gotten into the `timestamp` column, but it's in the part of the data you haven't provided. – wjandrea May 31 '22 at 20:59
  • 1
    @wjandrea you are absolutely right. I just checked. the txt file I'm using contains a bunch of other same-looking tables one after the other. – Zoi K. May 31 '22 at 21:04

1 Answers1

0

First, the error message you're quoting is from the header row. It's trying to parse the literal string 'timestamp' as a timestamp, which is failing. If you're getting an error on an actual data row, show us that message.

All three of your posted data rows parse fine with your format in my testing:

>>> [pandas.to_datetime(s, format='%Y%m%dT%H%M%S%f') 
    for s in ['20120918T131600000', '20120918T131700000', 
              '20120918T131800000']]
[Timestamp('2012-09-18 13:16:00'), Timestamp('2012-09-18 13:17:00'), Timestamp('2012-09-18 13:18:00')]

I have no idea where you got format='%Y%m%dT%H%M%S%f'[:-3], which just removes the S%f from the format string, leaving it invalid. If you want to remove the last three digits of the data so that you ca just use %H%M%S instead of %H%M%S%f, you need to put the [:-3] on the timestamp data value, not the format.

Mark Reed
  • 91,912
  • 16
  • 138
  • 175
  • 3
    Please don't answer *and* vote to close a question. Do one or the other. In this case, OP's issue is not reproducible, so a comment would be suitable. It's not the biggest problem, it's just that it sends mixed messages. So no need to delete the answer, just avoid in the future. – wjandrea May 31 '22 at 20:32