1

I'd wanna to read time strings and data from a file but when I used loadtxt i cant read string and numbers at the same time because strings are not float. So i tried using genfromtxt and use delimiter=[]+[]+[] acording the columns that I have, but the string are read like nan. i'd like read the time directly like a time array (date2num, datetime or similar) to be able to plot in matplotlib in the correct form. So, what can i do? I leave a mi list below (Obviously, it's more data):

GOES data for time interval: 20-Feb-2014 00:00:00.000 to 27-Feb-2014 00:00:00.000
Current time: 23-Mar-2014 21:52:00.00

Time at center of bin        1.0 - 8.0 A    0.5 - 4.0 A  Emission Meas           Temp
                              watts m^-2     watts m^-2    10^49 cm^-3             MK
20-Feb-2014 00:00:00.959     4.3439e-006    3.9946e-007        0.30841         10.793
20-Feb-2014 00:00:02.959     4.3361e-006    3.9835e-007        0.30801         10.789
20-Feb-2014 00:00:04.959     4.3413e-006    3.9501e-007        0.30994         10.743
20-Feb-2014 00:00:06.959     4.3361e-006    3.9389e-007        0.30983         10.735
20-Feb-2014 00:00:08.959     4.3361e-006    3.9278e-007        0.31029         10.722
20-Feb-2014 00:00:10.959     4.3387e-006    3.9278e-007        0.31058         10.719
20-Feb-2014 00:00:12.959     4.3361e-006    3.9278e-007        0.31029         10.722
20-Feb-2014 00:00:14.959     4.3361e-006    3.9055e-007        0.31122         10.695
20-Feb-2014 00:00:16.959     4.3334e-006    3.8721e-007        0.31234         10.657

Following the suggestions, I read the data using:

pd.read_csv('/filename',sep='\s\s+',header=5,
               names=['time','band1','band2','emeas','temp'])

and I got read the data, but just a problem, when I print the data appears:

                       time     band1  band2    emeas    temp
0  20-Feb-2014 00:00:03.005  0.000004      0  0.31000  10.866
1  20-Feb-2014 00:00:05.052  0.000004      0  0.31199  10.819
2  20-Feb-2014 00:00:07.102  0.000004      0  0.31190  10.811
3  20-Feb-2014 00:00:09.149  0.000004      0  0.31237  10.798
4  20-Feb-2014 00:00:11.199  0.000004      0  0.31266  10.795
5  20-Feb-2014 00:00:13.245  0.000004      0  0.31237  10.798
6  20-Feb-2014 00:00:15.292  0.000004      0  0.31334  10.770
7  20-Feb-2014 00:00:17.342  0.000004      0  0.31451  10.732
8  20-Feb-2014 00:00:19.389  0.000004      0  0.31451  10.732
9  20-Feb-2014 00:00:21.439  0.000004      0  0.31421  10.735

So, apparently the data of band1 and band2 have been rounded. Actually, when plotting it appears to be correct (non rounded), but why look like that in the frame.

nandhos
  • 681
  • 2
  • 16
  • 31
  • use f = open('**path/to/file/here**') and then work with split() or regexp – w5e Apr 18 '14 at 20:49
  • http://pymotw.com/2/re/ <- is nice for regexp – w5e Apr 18 '14 at 20:59
  • @Tweek ,A example could be good, thanks – nandhos Apr 18 '14 at 21:49
  • You might be able to use `genfromtxt` if you get the `dtype` argument right. This might also be helpful for reading strings with `genfromtxt`: http://stackoverflow.com/questions/12319969/how-to-use-numpy-genfromtxt-when-first-column-is-string-and-the-remaining-column – Kyle Neary Apr 18 '14 at 21:53
  • Do you know where I can find informartion about that? For example 'space' is equal to TABs? and do you know why in the code above, when trying to print band1 and band2 appear to be rounded? But when plot this data against index appear to be allright? – nandhos Jul 24 '15 at 19:04

2 Answers2

1

There are probably more elegant solutions using regular expressions, but this works too.

from datetime import datetime

input_file = open("path/filename")
for line in input_file:
    line_parts = line.split()
    if len(line_parts) > 1:
        try:
            # This is now a datetime object
            timestamp = datetime.strptime(line_parts[0] + " " + line_parts[1], "%d-%b-%Y %H:%M:%S.%f")
            # Do stuff with data here (each stored seperately in line_parts list)
            # For instance printing everything.
            print("DateTime Object: " + str(timestamp))
            print("Data: " + str(line_parts[2:]))

            # Cast data to floats for use in arithmetic
            data_point_one = float(line_parts[2])
            print ("data_point_one * 2 = " + str(data_point_one * 2))

        except ValueError:
            # Lines that don't start with a timestamp take this route...
            continue
Aaron
  • 1,893
  • 16
  • 24
  • That's right, It read the date and time like strings but the numbers (data) is also string now, It must be a float – nandhos Apr 19 '14 at 03:31
  • You can cast them to floats, I edited the above answer to show you how. – Aaron Apr 21 '14 at 14:20
1

You can use pandas.read_csv() because the sep parameter (equivalend to the delimiter in numpy.genfromtxt) accepts regular expressions. Then, with:

import pandas as pd

pd.read_csv('test.txt', sep='\s\s+', header=4)

you will get the desired output.

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • Thanks @Saullo I used pandas, just like your suggestion. My question was edited above. So, I was looking for "regular expressions" of `sep` but without success: `sep='s*'` means space with any lenght?, `sep='\s\s+'` more than 2 spaces? – nandhos Jul 24 '15 at 17:58
  • @nandhos correct, but with any lenght (including zero) should be `sep='\s*'` – Saullo G. P. Castro Jul 24 '15 at 18:57
  • Thanks @Saullo, just one more, dou you know why, when I read using the code corrected above, the columns band1 and band2 appear to be rounded? Actually when I plot this data is alwright, it is only when I print it is show the frame above. – nandhos Jul 24 '15 at 21:20
  • @nandhos To be honest I don't know the reason, it could be some printing configuration of pandas, but I'm not sure... – Saullo G. P. Castro Jul 24 '15 at 21:34