2

I've a very simple question: which is the most efficient way to read different entries from a txt file with Python?

Suppose I've a text file like:

42017     360940084.621356  21.00  09/06/2015  13:08:04
42017     360941465.680841  29.00  09/06/2015  13:31:05
42017     360948446.517761  16.00  09/06/2015  15:27:26
42049     361133954.539315  31.00  11/06/2015  18:59:14
42062     361208584.222483  10.00  12/06/2015  15:43:04
42068     361256740.238150  19.00  13/06/2015  05:05:40

In C I would do:

while(fscanf(file_name, "%d %lf %f %d/%d/%d %d:%d:%d", &id, &t0, &score, &day, &month, &year, &hour, &minute, &second) != EOF){...some instruction...}

What would be the best way to do something like this in Python? In order to store every value into a different variable (since I've got to work with those variables throughout the code).

Thanks in advance!

The6thSense
  • 8,103
  • 8
  • 31
  • 65
urgeo
  • 645
  • 1
  • 9
  • 19
  • possible duplicate of [Python fastest way to read a large text file (several GB)](http://stackoverflow.com/questions/14944183/python-fastest-way-to-read-a-large-text-file-several-gb) – user3636636 Jul 21 '15 at 09:53
  • 1
    Do you want a list of string or a list of types depending to the column ? – FunkySayu Jul 21 '15 at 09:53
  • You could look at Numpy [loadtxt](http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) – Mel Jul 21 '15 at 10:13

3 Answers3

2

I feel like the muddyfish answer is good, here is another way (maybe a bit lighter)

import time
with open(file) as f:
    for line in f:
        identifier, t0, score, date, hour = line.split()

        # You can also get a time_struct from the time
        timer = time.strptime(date + hour, "%d/%m/%Y%H:%M:%S")
FunkySayu
  • 7,641
  • 10
  • 38
  • 61
  • 1
    note that id is a reserved word. If you want to use it as an identifier, use id_ = value instead – muddyfish Jul 21 '15 at 09:59
  • Thanks FunkySayu! I also ended up to something similar... since I need each single entry (day, month, year, etc.), I was wondering whether there is a faster way or do I have to use line.split("/") and line.split(":") another time? – urgeo Jul 21 '15 at 10:02
  • The point is that I've got to work with each single entry (like make operations with the t0 and the different days and months), so I need to store data into different variables – urgeo Jul 21 '15 at 10:09
0

I would look up the string.split() method

For example you could use

for line in file.readlines():
    data = dict(zip(("id", "t0", "score", "date", "time"), line.split(" ")))
    instructions()
muddyfish
  • 3,530
  • 30
  • 37
0

Depending on what you want to do with the data, pandas may be something to look into:

import pandas as pd

with open(file_name) as infile:
    df = pd.read_fwf(infile, header=None, parse_dates=[[3, 4]], 
        date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y %H:%M:%S'))

The double list [[3, 4]], together with the date_parser argument, will read the the third and fourth (0-indexed) columns as a single data-time object. You can then access individual parts of that column with

>>> df['3_4'].dt.hour
0    13
1    13
2    15
3    18
4    15
5     5
dtype: int64

(If you don't like the '3_4' key, use the parse_dates argument above as follows:

parse_dates={'timestamp': [3, 4]}

)

read_fwf is for reading fixed width columns, which your data seems to adhere to. Alternatively, there are functions such as read_csv, read_table and a lot more.

(This answer is pretty much a duplicate of this SO answer, but since this question here is more general, I've put this here as another answer, not as a comment.)

Community
  • 1
  • 1