0

I want to analyse a temp file (it has the .txt extension) in real time. Temp file has format:

6000 -64.367700E+0 19.035500E-3
8000 -64.367700E+0 18.989700E-3

However after importing & printing it is not a matrix as I hoped, but actually has format:

'6000\t-64.367700E+0\t19.035500E-3\n8000\t-64.367700E+0\t18.989700E-3'

I tried importing line by line, but since it's in string format I couldn't get xreadlines() or readlines() to work. I can split the string, then separate the data into an appropriate list for analysis, but are there any suggestions to only deal with new data. As the file gets larger it will slow the code down to reprocess all the data regularly and I can't work out how to replicate an xreadlines() loop.

Thanks for any help

Ivan Kolesnikov
  • 1,787
  • 1
  • 29
  • 45
mattjd
  • 1
  • 2

1 Answers1

0

Have you tried to use this?

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

You could specify the separator which is \t.

belka
  • 1,480
  • 1
  • 18
  • 31
  • Thanks. Took me a bit of reading, but dataframes from Panda seem the way to go. Now it's just how to only extract new data as it's added to the original temp text file. – mattjd Aug 24 '17 at 12:03
  • You should try to read this: https://docs.python.org/2/library/logging.html As far as I know, it would be complicated to both read the text file and write in it because of concurrency problems. You should try this: https://stackoverflow.com/questions/32594137/streaming-data-for-pandas-df I would process with batches of files, performing analysis on part of the data rather than directly analysing incoming chunks. Or try to look into streaming systems like spark-streaming and Kafka. Do not forget to upvote if this suited you – belka Aug 24 '17 at 12:46
  • I have upvoted, but since I'm a complete noob, my rep is too low for it to register. Sorry, & thanks. I'm taking the temp file, reading it, keeping the last 100 readings in memory (on a rolling basis), processing the data based on the previous 100 readings, and then appending the original and processed data into a csv file. Also checking out NumPy arrays as they give me more data processing functionality. – mattjd Aug 24 '17 at 13:20
  • Yes, that's a good idea. + while you try to read the 100 last readings, do not forget to put a lock on your file. Another solution would be to have a balance system, eg. read from A and write in B, then following a given rule (like after 10 minutes passed) switch – belka Aug 24 '17 at 14:10