This answer follows your lead in using loadtxt
, and hopefully explains what you got, and alternatives. But if you aren't doing any calculations, it may be simpler to just read each line, split it, and write it back in the desire format. A csv
reader may make that task simpler, but is not essential. Plain Python line read and writes, and string manipulation will work.
============
Using a string copy of your sample (bytestring in PY3):
In [296]: txt=b"""name, lat, lon, alt, time
...: id1, 40.436047, -74.814883, 33000, 2016-01-21T08:08:00Z
...: id2, 40.436047, -74.814883, 33000, 2016-01-21T08:08:00Z""".splitlines(
...: )
In [297]: txt
Out[297]:
[b'name, lat, lon, alt, time',
b'id1, 40.436047, -74.814883, 33000, 2016-01-21T08:08:00Z',
b'id2, 40.436047, -74.814883, 33000, 2016-01-21T08:08:00Z']
In [298]: data=np.loadtxt(txt,delimiter=',',dtype=np.string_,skiprows=1)
In [299]: data
Out[299]:
array([[b'id1', b' 40.436047', b' -74.814883', b' 33000',
b' 2016-01-21T08:08:00Z'],
[b'id2', b' 40.436047', b' -74.814883', b' 33000',
b' 2016-01-21T08:08:00Z']],
dtype='|S21')
In [300]: data[:,4]
Out[300]:
array([b' 2016-01-21T08:08:00Z', b' 2016-01-21T08:08:00Z'],
dtype='|S21')
Or with the unpack
In [302]: name,lat,lon,alt,time=np.loadtxt(txt,delimiter=',',dtype=np.string_,sk
...: iprows=1,unpack=True)
In [303]: time
Out[303]:
array([b' 2016-01-21T08:08:00Z', b' 2016-01-21T08:08:00Z'],
dtype='|S21')
we've loaded the file as a 2d array of strings, or 5 1d arrays. time
is an array of strings.
I can convert this array of strings into an array of datatime objects:
In [307]: time1 = time.astype(np.datetime64)
In [308]: time1
Out[308]: array(['2016-01-21T08:08:00', '2016-01-21T08:08:00'], dtype='datetime64[s]')
In [309]: time1[0]
Out[309]: numpy.datetime64('2016-01-21T08:08:00')
I could even load it directly with datetimes. But this doesn't solve your display issues.
=====================
genfromtxt
gives more power to loading different column types
In [312]: np.genfromtxt(txt,dtype=None,skip_header=1,delimiter=',')
Out[312]:
array([(b'id1', 40.436047, -74.814883, 33000, b' 2016-01-21T08:08:00Z'),
(b'id2', 40.436047, -74.814883, 33000, b' 2016-01-21T08:08:00Z')],
dtype=[('f0', 'S3'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<i4'), ('f4', 'S21')])
this gives a mix of string, floats and int. The dates are still strings.
If I replace the dtype=None
with a specific dtype, I can dates as before:
In [313]: dt=['S3','f','f','i','datetime64[s]']
In [315]: data=np.genfromtxt(txt,dtype=dt,skip_header=1,delimiter=',')
In [316]: data
Out[316]:
array([ (b'id1', 40.4360466003418, -74.81488037109375, 33000, datetime.datetime(2016, 1, 21, 8, 8)),
(b'id2', 40.4360466003418, -74.81488037109375, 33000, datetime.datetime(2016, 1, 21, 8, 8))],
dtype=[('f0', 'S3'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<i4'), ('f4', '<M8[s]')])
In [317]: data['f4']
Out[317]: array(['2016-01-21T08:08:00', '2016-01-21T08:08:00'], dtype='datetime64[s]')
===============
A first cut at writing this back out to file
In [318]: np.savetxt('test.txt',data,fmt='%4s, %.5f, %.5f, %d, %s')
In [320]: cat test.txt
b'id1', 40.43605, -74.81488, 33000, 2016-01-21T08:08:00
b'id2', 40.43605, -74.81488, 33000, 2016-01-21T08:08:00
Controlling the float precision is obvious. I need to fix the 1st byte string display. And it does not split the date - I'm just displaying is normal string representation.
=================
You can convert the np.datetime64
array into an array of datetime
objects:
In [361]: from datetime import datetime
In [362]: data['f4'].astype(datetime)
Out[362]:
array([datetime.datetime(2016, 1, 21, 8, 8),
datetime.datetime(2016, 1, 21, 8, 8)], dtype=object)
I can convert this into an array of strings with comma delimiter:
In [383]: tfmt='%Y, %m, %d, %H, %M, %S'
In [384]: timefld=data['f4'].astype(datetime)
In [385]: timefld = np.array([d.strftime(tfmt) for d in timefld])
In [386]: timefld
Out[386]:
array(['2016, 01, 21, 08, 08, 00', '2016, 01, 21, 08, 08, 00'],
dtype='<U24')
=========================
A pure text editing approach could use functions like
def foo(dtstr):
return dtstr.replace(b'-',b', ').replace(b':',b', ').replace(b'T',b', ').replace(b'Z',b'')
def foo(dtstr):
# cleaner version with re
import re
return re.sub(b'[-:T]',b', ',dtstr[:-1])
def editline(aline):
aline=aline.split(b',')
aline[4]=foo(aline[4])
return b', '.join(aline)
In [408]: [editline(aline) for aline in txt[1:]]
Out[408]:
[b'id1, 40.436047, -74.814883, 33000, 2016, 01, 21, 08, 08, 00',
b'id2, 40.436047, -74.814883, 33000, 2016, 01, 21, 08, 08, 00']