2

I have a DataFrame of time series with second resolution, which I save into a CSV file:

import pandas as pd

dr = pd.date_range('01/01/2020 9:00', '01/01/2020 9:10', freq='1s')
df = pd.DataFrame(index=dr, data=range(len(dr)))
df.to_csv('some_data.csv', header=False)

I can then open it in Excel and everything looks good:

enter image description here

If I then save the file in Excel (without changing anything), on reopening the seconds are rounded down to 0:

enter image description here

Looking at the Excel-saved file in notepad shows that the seconds have been lost:

1/1/2020 9:00,0
1/1/2020 9:00,1
1/1/2020 9:00,2
...

The change occurs when saving as a copy or overwriting. And oddly, if you keep the original document open after saving it, you will still see the seconds preserved, until you close and reopen.

In context, I am writing files for other (non-Python-using) users who will likely be using Excel to interact with their data. And even resizing a column prompts for a save, so I find it very likely that they could be inadvertently losing data.

While I used Python to create this example, I have seen the issue with timestamps written in other languages as well.

Is there a better way to be logging time series data that prevents this? Or is there a fix in Excel (preferably a permanent option or setting) that I could share with users who get data in this format?

Tom
  • 8,310
  • 2
  • 16
  • 36
  • under cell formatting, you can give it a custom mode that includes seconds – Mohammad Athar Jun 30 '20 at 20:34
  • @MohammadAthar Athar Thank you for that suggestion; I updated to respond to this, it seems like you have to do so every time you open the file? If that's the case, that still seems fairly susceptible to data loss (even if users know that's a fix, will they remember?) – Tom Jun 30 '20 at 20:51
  • 1
    How about saving directly with pandas `to_excel`, instead of `to_csv`? Does that give you the same problem? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html – vmouffron Jun 30 '20 at 22:27
  • How are you saving the file? If you are saving it as a csv file, then the date/time field will be saved as formatted. But if you are saving it as an Excel file, the underlying value should be saved. – Ron Rosenfeld Jul 01 '20 at 01:05
  • 1
    Related: https://stackoverflow.com/questions/27801193/how-to-prevent-excel-from-truncating-numbers-in-a-csv-file - seems like another case where excel tries to "help" you. And ends up being a pain. – FObersteiner Jul 01 '20 at 05:18

1 Answers1

0

Posting an answer based on the comments:

Save as XLSX rather than CSV

Save directly into Excel format (df.to_excel() instead of df.to_csv()) or save your CSV into Excel format from Excel. This will preserve the timestamps and not require any additional formatting. Credit to vmouffron and Ron for this approach.

Format the data of the CSV in Excel

The other option is to use the "Format Cell" option (for the cells or for the whole date column) with a "Custom" format (in this case, changing the template "m/d/yyyy h:mm" to "m/d/yyyy h:mm:ss"). This option allows you to retain the CSV format and the full extent of the time data. Credit to Mohammad for this answer.

Each of these options has a small downside. In the first, you have to lock users into using Excel to view the data. But there are many free tools for converting the data from one format to the other, so this is easy for users to handle. For the second option, the formatting is not permanent and has to be done every time the file is opened. This seems less convenient for users, but still may be useful in some situations.

Tom
  • 8,310
  • 2
  • 16
  • 36