0

I am reading a parquet file in a c++ program on windows platform. The .parquet file has a column as "timestamp" and the data is like "2021-04-06 16:48:04.614365+00:00". Presently I use int64_t as datatype to read the timestamp and it reads the time stamp as "1671205687722819000"

To read an integer I use code

std::shared_ptr<parquet::RowGroupReader> row_group_1 =  demoObj->myParquetReader->RowGroup(0);
std::shared_ptr<parquet::ColumnReader> colVal = row_group_1->Column(4);
parquet::Int64Reader* readerVal = static_cast<parquet::Int64Reader*>(colVal.get());
int ReadVal;
rowsRead = readerVal->**ReadBatch**(1, null, null, &ReadVal, &valRead);

How do I read the date time?

RKum
  • 758
  • 2
  • 12
  • 33
  • 1
    You could try converting it to a time_t and see if you get any useful output. Granted, time_t isn't gauranteed to be 64-bit (https://stackoverflow.com/questions/471248/what-is-time-t-ultimately-a-typedef-to). If you have this, then you should just be able to manipulate it in your code using C/C++ time libraries. – Andy Mar 10 '23 at 09:16
  • Thank you Andy. I will check that. If there is any datatype/function provided by parquet then it will be faster conversion. – RKum Mar 10 '23 at 09:54
  • 1
    Looks like time is stored in microseconds, so divide by 1'000'000 to get an epoch timestamp. – Botje Mar 10 '23 at 12:14

1 Answers1

1

The interpretation of timestamp columns varies according to type parameters. From the documentation:

The TIMESTAMP type has two type parameters:

  • isAdjustedToUTC must be either true or false.
  • unit must be one of MILLIS, MICROS or NANOS. This list is subject to potential expansion in the future. Upon reading, unknown unit-s must be handled as unsupported features (rather than as errors in the data files).

These parameters should be present in the LogicalType metadata, with a backwards-compatible definition in ConvertedType metadata.

From your post it looks like this file stored data in microseconds, so you can divide by 1'000'000 to get a unix epoch. But to know whether it is UTC-adjusted or not you need to query the metadata anyway, so do it right from the start and make your code handle the types according to the metadata.

Botje
  • 26,269
  • 3
  • 31
  • 41