retrieving data from a .txt file using python?

Question

In the following you can see data from a ephemeris.txt file. Now I want to retrieve several columns(say, for example the column starting with 00:00, 27.69 and 44.1) and name the array as x,y,z. What do I have to do?

I tried this

x, y, z = numpy.loadtxt("ephemeris.txt", unpack=True)

And this get this error

"ValueError: could not convert string to float: Date__(UT)__HR:MN"

Could you also help me in converting that HR:MN into minute only?

Date__(UT)__HR:MN     R.A.__(a-apparent)__DEC\
**********************************************\
 2013-Jan-01 00:00 *   14 31 27.69 -12 29 44.1\
 2013-Jan-01 00:01 *   14 31 27.71 -12 29 44.1\
 2013-Jan-01 00:02 *   14 31 27.72 -12 29 44.2\
 2013-Jan-01 00:03 *   14 31 27.73 -12 29 44.2\
 2013-Jan-01 00:04 *   14 31 27.75 -12 29 44.3\
 2013-Jan-01 00:05 *   14 31 27.76 -12 29 44.3\
 2013-Jan-01 00:06 *   14 31 27.77 -12 29 44.4\
 2013-Jan-01 00:07 *   14 31 27.78 -12 29 44.4\
 2013-Jan-01 00:08 *   14 31 27.80 -12 29 44.4\
 2013-Jan-01 00:09 *   14 31 27.81 -12 29 44.5\

thanks in advance

Sorry, I didn't noticed you are already using `numpy.loadtxt`. Take a look at my answer, then :o) — heltonbiker, Apr 09 '13 at 22:30

heltonbiker · Accepted Answer · 2013-04-09T22:35:13.983

1

You can use some more arguments of the loadtxt function.

The error you are getting most probably is due to the first two header lines, so skip them with the skiprows=2 argument;

Also, each row contains data in a different format, separated by space. Use delimiter=' ' just in case, and you can opt between dtype=string and dtype=object.

a = numpy.loadtxt("ephemeris.txt", delimiter=' ', dtype=string, skiprows=2)

This should give you a single array from where you can perform many kinds of "conversions": split one array per column, create a list of rows, etc.

x,y,z,etc = numpy.hsplit(a, a.shape[1])
x = x.astype(datetime)

# or
x = a[:,0].astype(datetime)
y = a[:,1].astype(some_type)

or something along these lines...

Hope this helps, and please elaborate more in the comments if needed.

edited Apr 09 '13 at 22:35

answered Apr 09 '13 at 22:30

heltonbiker

26,657
28
137
252

a = np.loadtxt("ephemeris.txt", delimiter=' ', dtype=string, skiprows=2) gives the following error: "name 'string' is not defined" a.shape=11 x,y,z,etc = numpy.hsplit(a, a.shape) gives the error: ValueError: too many values to unpack – user22180 Apr 10 '13 at 09:41
1) sorry, it should be `dtype='string'` (the type "string" must be between quotes). 2) `x,y,z,etc` wasn't supposed to be meant literally, the number of arrays you are going to create depends on how many columns you have in your txt file. Supposing you have 8 columns, you would do `b,c,d,e,f,g,h,i = numpy.hsplit(a, 8)`. At last, I have the impression that you are not familiar with python or programming in general, are you really UNDERSTANDING the explanation, or just copy-pasting code and trying to run it blindly? (no offense, but perhaps you need a DEEPER help). – heltonbiker Apr 10 '13 at 13:49
Probably you are saying this by seeing 'etc', but I tried 'string' also and it gives ValueError: cannot set an array element with a sequence.After that I tried object instead of 'string' and got no error. But the following b,c,d,e,f,g,h,i,j,k,l = numpy.hsplit(a, a.shape) gives the error "ValueError: need more than 2 values to unpack" One thing I agree with you I need a better help to understand the errors. Say from the error "need more than 2 values to unpack", I have understood nothing and that is why I couldn't correct the error.Is there any site which explains the different type of errors – user22180 Apr 10 '13 at 21:19
You should have used `numpy.hsplit(a, a.shape[1])`. Without the `[1]` part it shouldn't work, because the second argument would be seen as a sequence of length two (the two dimensions of the array). With the index, it would be seen as an integer, corresponding to the second dimension of the array, that is, its number of columns. – heltonbiker Apr 10 '13 at 23:06
a = numpy.loadtxt("ephemeris.txt", delimiter=' ', dtype='string', skiprows=2) gives the following error: ValueError: cannot set an array element with a sequence If instead of 'string' , I write object , then there is no error but a.shape[1] gives the following error IndexError: tuple index out of range but a.shape[0] is equal to 11 please try to resolve 'string' problem first.By the way I have got my answer say, for example, b=array([ ('2013-Jan-01', '00:00', '*', '', '', '14', '31', '27.69', '-12', '29', '44.1\\')], dtype=object). which is not a column but the first row. – user22180 Apr 11 '13 at 07:10
I think from this point on you should be able to figure out these minor details, don't you? For example, if you print the array (`print a`), what does it looks like? – heltonbiker Apr 11 '13 at 13:44
OK I will try to figure it out.But can you please help me in understanding the errors. Do I have to know the inner code of python to understand those?? – user22180 Apr 11 '13 at 16:48
1

Thank you, Now I have figured out the problem with "string". Actually there was a space before the first column, that is the column starting with 2013-Jan-01. – user22180 Apr 16 '13 at 10:00
Yeah, I noticed that either, but couldn't write you with a working code. I have another suggestion: you have a column with "date" and another with "time". You could use the function `datetime.strptime` (from datetime module), and apply it on the "zipped" results of the second and third columns, to get a column of type "datetime". That way, you can perform mathematical operations upon them, since they are not just strings anymore. – heltonbiker Apr 16 '13 at 12:54
And, since your date is formatted "YYYY-Mon-DD", you would not be able to use `datetime.strptime`, but you can use this (see second line of formats supported): http://code.google.com/p/parsedatetime/ – heltonbiker Apr 16 '13 at 13:02

jazzpi · Answer 2 · 2013-04-09T19:56:26.770

0

import re
f = open("ephemeris.txt")
for line in f.readlines():
    r = re.search("(\d{4})\-(\w{3})-(\d{2}) (\d{2}):(\d{2}) \*   (.*?)\\\n", line)
    if r:
        print "Year: "+r.group(1)
        print "Month: "+r.group(2)
        print "Day: "+r.group(3)
        print "Hour: "+r.group(4)
        print "Minute: "+r.group(5)
        print "Data: "+r.group(6)

This will read every line of the file, check if it matches the pattern and if it does so, print all the data it could retrieve.

edited Apr 09 '13 at 19:56

answered Apr 09 '13 at 19:14

jazzpi

1,399
12
18

sorry , this shows the following error: error: bogus escape (end of line) – user22180 Apr 09 '13 at 19:51
@user22180: Fixed, forgot a \n at the end of the pattern :( – jazzpi Apr 09 '13 at 19:57
could you be more specific? – user22180 Apr 10 '13 at 09:15

score 0 · Answer 3 · answered Apr 09 '13 at 19:46

You can also split each line, setting a character separator. Then you can access on each (string) token using indexes:

def prova():
    f = open('/home/frenk/Desktop/ephemeris.txt')
    l = []
    for line in f:
        l = line.split(" ")
        print "date: " + l[1]

Second, if you want to convert a string like "31" to integer 31, you can simply write:

x = int('31')

Note that you can select a slice of string using slice notation:

string = "This is a slice of string"
print string[10:15]

retrieving data from a .txt file using python?

3 Answers3

Linked