1

I have a dataframe and I want to convert it to a numpy array.

>>> df.index
DatetimeIndex(['2018-02-28 01:00:00', '2018-02-28 01:01:00',
            '2018-02-28 01:02:00', '2018-02-28 01:03:00',
            '2018-02-28 01:04:00', '2018-02-28 01:05:00',
            '2018-02-28 01:06:00', '2018-02-28 01:07:00',
            '2018-02-28 01:08:00', '2018-02-28 01:09:00',
            ...
            '2018-02-28 17:25:00', '2018-02-28 17:26:00',
            '2018-02-28 17:27:00', '2018-02-28 17:28:00',
            '2018-02-28 17:29:00', '2018-02-28 17:30:00',
            '2018-02-28 17:31:00', '2018-02-28 17:32:00',
            '2018-02-28 17:33:00', '2018-02-28 17:34:00'],
            dtype='datetime64[ns]', name='date', length=995, freq='T')

But if I just simply try to convert it, the format changes.

>>> np.array( ohlc.index )
array(['2018-02-28T01:00:00.000000000', '2018-02-28T01:01:00.000000000',
    '2018-02-28T01:02:00.000000000', '2018-02-28T01:03:00.000000000',
    '2018-02-28T01:04:00.000000000', '2018-02-28T01:05:00.000000000',
    ...
    '2018-02-28T17:30:00.000000000', '2018-02-28T17:31:00.000000000',
    '2018-02-28T17:32:00.000000000', '2018-02-28T17:33:00.000000000',
    '2018-02-28T17:34:00.000000000'], dtype='datetime64[ns]')

It seems 2018-02-28 01:00:00 and 2018-02-28T01:00:00.000000000 are not the same. How can I keep the format?

maynull
  • 1,936
  • 4
  • 26
  • 46
  • Why is the format important? Numpy isn't actually storing a string of 0's. It's just a string representation for display purposes, e.g. 15.0000001 might sometimes be displayed instead of 15.0 due to float representation. You can easily set formatting when you need to `print` an array. Is that what you need? – jpp Mar 03 '18 at 11:21
  • @jpp Thank you for the comment. Actually, I want to make a dictionary the keys of which are the same as df.index. ```dic_timeToIndex = dict(zip( np.array(df.index), array_anotherIndex ))``` but in this case, the format changes and so does the ```keys```. That's why I want to keep the format. – maynull Mar 03 '18 at 11:43
  • I still don't understand. Why is it not a good idea to have your dictionary keys as timestamps? There is no requirement that dictionary keys must be strings / specially formatted for printing. – jpp Mar 03 '18 at 11:44
  • I've added a solution where your keys are timestamps. – jpp Mar 03 '18 at 12:17
  • 1
    @jpp Thank you! After reading your comments, I looked into my code again and realized that using timestamps is simpler than my original idea. I appreciate you helping me! :) – maynull Mar 03 '18 at 12:41
  • Meta on this question: https://meta.stackoverflow.com/questions/364079/how-to-handle-equivalent-answers-but-different-explanations – jpp Mar 03 '18 at 14:35

2 Answers2

2

First 2018-02-28 01:00:00 is 2018-02-28T01:00:00.000000000 as represenation of datetime64[ns].

Under the hood datetimes are long integers as nanoseconds since epoch called unix time:

c = a.values.astype(np.int64)
print (c)
[1519779600000000000 1519779660000000000 1519779720000000000
 1519779780000000000 1519779840000000000 1519779900000000000
 1519779960000000000 1519780020000000000 1519780080000000000
 1519780140000000000 1519838700000000000 1519838760000000000
 1519838820000000000 1519838880000000000 1519838940000000000
 1519839000000000000 1519839060000000000 1519839120000000000
 1519839180000000000 1519839240000000000]

You can also check this for more info.


If want strings:

b = df.index.astype(str).values

Or:

b = df.index.strftime('%Y-%m-%d %H:%M:%S')

print (b)
['2018-02-28 01:00:00' '2018-02-28 01:01:00' '2018-02-28 01:02:00'
 '2018-02-28 01:03:00' '2018-02-28 01:04:00' '2018-02-28 01:05:00'
 '2018-02-28 01:06:00' '2018-02-28 01:07:00' '2018-02-28 01:08:00'
 '2018-02-28 01:09:00' '2018-02-28 17:25:00' '2018-02-28 17:26:00'
 '2018-02-28 17:27:00' '2018-02-28 17:28:00' '2018-02-28 17:29:00'
 '2018-02-28 17:30:00' '2018-02-28 17:31:00' '2018-02-28 17:32:00'
 '2018-02-28 17:33:00' '2018-02-28 17:34:00']

Another way is cast to second precision with floor if exist ms, us, ns.

b = df.index.values.astype('datetime64[s]')
print (b)
['2018-02-28T01:00:00' '2018-02-28T01:01:00' '2018-02-28T01:02:00'
 '2018-02-28T01:03:00' '2018-02-28T01:04:00' '2018-02-28T01:05:00'
 '2018-02-28T01:06:00' '2018-02-28T01:07:00' '2018-02-28T01:08:00'
 '2018-02-28T01:09:00' '2018-02-28T17:25:00' '2018-02-28T17:26:00'
 '2018-02-28T17:27:00' '2018-02-28T17:28:00' '2018-02-28T17:29:00'
 '2018-02-28T17:30:00' '2018-02-28T17:31:00' '2018-02-28T17:32:00'
 '2018-02-28T17:33:00' '2018-02-28T17:34:00']

EDIT: As mentioned in comments, is not necessary converting, but it depends of requirement of keys:

i = pd.DatetimeIndex(['2018-02-28 01:00:00', '2018-02-28 01:01:00',
            '2018-02-28 01:02:00', '2018-02-28 01:03:00',
            '2018-02-28 01:04:00', '2018-02-28 01:05:00',
            '2018-02-28 01:06:00', '2018-02-28 01:07:00',
            '2018-02-28 01:08:00', '2018-02-28 01:09:00'])
df = pd.DataFrame(index=i)
print (df)
Empty DataFrame
Columns: []
Index: [2018-02-28 01:00:00, 2018-02-28 01:01:00, 2018-02-28 01:02:00, 
        2018-02-28 01:03:00, 2018-02-28 01:04:00, 2018-02-28 01:05:00, 
        2018-02-28 01:06:00, 2018-02-28 01:07:00, 2018-02-28 01:08:00, 
        2018-02-28 01:09:00]

Selecting by Timestamps:

d = dict(zip(df.index, np.arange(10)))
{Timestamp('2018-02-28 01:00:00'): 0, Timestamp('2018-02-28 01:01:00'): 1, 
 Timestamp('2018-02-28 01:02:00'): 2, Timestamp('2018-02-28 01:03:00'): 3, 
 Timestamp('2018-02-28 01:04:00'): 4, Timestamp('2018-02-28 01:05:00'): 5, 
 Timestamp('2018-02-28 01:06:00'): 6, Timestamp('2018-02-28 01:07:00'): 7, 
 Timestamp('2018-02-28 01:08:00'): 8, Timestamp('2018-02-28 01:09:00'): 9}

print (d[pd.Timestamp('2018-02-28 01:00:00')])
0

print (d[pd.to_datetime('2018-02-28 01:00:00')])

Selecting by strings, simpliest:

d1 = dict(zip(df.index.astype(str).values, np.arange(10)))
{'2018-02-28 01:00:00': 0, '2018-02-28 01:01:00': 1, '2018-02-28 01:02:00': 2, 
 '2018-02-28 01:03:00': 3, '2018-02-28 01:04:00': 4, '2018-02-28 01:05:00': 5, 
 '2018-02-28 01:06:00': 6, '2018-02-28 01:07:00': 7, '2018-02-28 01:08:00': 8, 
 '2018-02-28 01:09:00': 9}

d1 = dict(zip(df.index.strftime('%Y-%m-%d %H:%M:%S'), np.arange(10)))
{'2018-02-28 01:00:00': 0, '2018-02-28 01:01:00': 1, '2018-02-28 01:02:00': 2, 
 '2018-02-28 01:03:00': 3, '2018-02-28 01:04:00': 4, '2018-02-28 01:05:00': 5, 
 '2018-02-28 01:06:00': 6, '2018-02-28 01:07:00': 7, '2018-02-28 01:08:00': 8, 
 '2018-02-28 01:09:00': 9}

print (d1['2018-02-28 01:00:00'])
0

print (dict(zip(df.index.values.astype('datetime64[s]'), np.arange(10))))
{numpy.datetime64('2018-02-28T01:00:00'): 0, 
 numpy.datetime64('2018-02-28T01:01:00'): 1, 
 numpy.datetime64('2018-02-28T01:02:00'): 2, 
 numpy.datetime64('2018-02-28T01:03:00'): 3, 
 numpy.datetime64('2018-02-28T01:04:00'): 4, 
 numpy.datetime64('2018-02-28T01:05:00'): 5, 
 numpy.datetime64('2018-02-28T01:06:00'): 6, 
 numpy.datetime64('2018-02-28T01:07:00'): 7, 
 numpy.datetime64('2018-02-28T01:08:00'): 8, 
 numpy.datetime64('2018-02-28T01:09:00'): 9}
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

From the information you have provided, you don't need to convert datetime objects to strings. In fact, no type conversion is necessary.

Where possible, keep your data, inputs and outputs structured. Strings are often only useful for I/O.

import pandas as pd
from dateutil import parser

idx = pd.DatetimeIndex(['2018-02-28 01:00:00', '2018-02-28 01:01:00',
                        '2018-02-28 01:02:00', '2018-02-28 01:03:00',
                        '2018-02-28 01:04:00', '2018-02-28 01:05:00'],
                       dtype='datetime64[ns]', name='date')

values = [1, 2, 3, 4, 5, 6]

d = dict(zip(idx, values))

x = parser.parse('2018-02-28 01:02:00') 
# equivalently, x = pd.to_datetime('2018-02-28 01:02:00')

d[x]  # 3

Consider this fundamental problem with storing your keys as strings:

x = parser.parse('2018-02-28 01:02:00')
y = parser.parse('2018-02-28 1:02:00')

print(x == y)          # True

x_str = '2018-02-28 01:02:00'
y_str = '2018-02-28 1:02:00'

print(x_str == y_str)  # False
jpp
  • 159,742
  • 34
  • 281
  • 339