2

I got a big text file with data from a spectroscopy.

The first few lines are like these:

397.451 -48.38

397.585 -48.38

397.719 -48.38

397.853 -18.38

397.987 -3.38

398.121 6.62

398.256 -0.38

398.39  -1.38

398.524 7.62

398.658 4.62

398.792 -4.38

398.926 12.62

399.06  5.62

399.194 -6.38

399.328 -6.38

399.463 0.6

399.597 -6.38

399.731 -12.38

399.865 1.62

399.999 2.62

What I would like to do is to create two lists where one contains e.g [397.451, 397.585, 397.719.... etc]

And the other [-48.38, -48.38,-48.38, -18.38,-3.38 ...etc]

pippo1980
  • 2,181
  • 3
  • 14
  • 30
Johand
  • 21
  • 1
  • use split() for i in list then append split()[0] to one new list1and split()[1] to one new list2 – pippo1980 Feb 25 '21 at 18:12
  • ok first need to read file line by line and append values of each line into a list – pippo1980 Feb 25 '21 at 18:14
  • Does this answer your question? [Reading specific columns from a text file in python](https://stackoverflow.com/questions/30216573/reading-specific-columns-from-a-text-file-in-python) – Kraigolas Feb 25 '21 at 18:37
  • I think pandas `read_csv` is the way to go for this. It'll give you a dataframe. – Kraigolas Feb 25 '21 at 18:39

4 Answers4

1

Sticking to the basics:

fil = open("big_text_file.txt")
list1 = []
list2 = []
text = fil.readline()
while text:
    try:
        nums = text.split()
        list1.append(float(nums[0]))
        list2.append(float(nums[1]))
    except:
        pass
    text = fil.readline()

print(list1)
print(list2)

Explanation:

  • create two lists
  • As you said it is a big text file (so reading line by line)
  • splitting the line read on space " " (Single Space is default in split)
  • If the above fails means empty line. (That's what try and except are for)
  • update the two lists (if no error)
  • read next line.

Output:

[397.451, 397.585, 397.719, 397.853, 397.987, 398.121, 398.256, 398.39, 398.524, 398.658, 398.792, 398.926, 399.06, 399.194, 399.328, 399.463, 399.597, 399.731, 399.865, 399.999]
[-48.38, -48.38, -48.38, -18.38, -3.38, 6.62, -0.38, -1.38, 7.62, 4.62, -4.38, 12.62, 5.62, -6.38, -6.38, 0.62, -6.38, -12.38, 1.62, 2.62]
Rishabh Kumar
  • 2,342
  • 3
  • 13
  • 23
  • This did the job perfectly, thank you so much! – Johand Feb 25 '21 at 18:39
  • @Rishabh Kumar is it faster for very big files; try: nums = text.split() except: pass – pippo1980 Feb 25 '21 at 18:59
  • See, It may not be the fastest way to do it. But its memory efficient. As OP said its a very big text file. let's assume something in GBs and say system memory is 4GB (pretty common), then this could pose a problem. If you have enough memory in your system, there are other options too, like loading the entire text file into memory using `readLines` and all, this could be faster. – Rishabh Kumar Feb 25 '21 at 19:22
  • @Rishabh Kumar I am trying to evaluate the time needed with different option using begin0 = datetime.now() , time.process_time(), time.perf_counter() and then after the script print('time 0 :' , datetime.now() - begin0[0], ' process_time : ', time.process_time() - begin0[1] , ' perf_counter : ', time.perf_counter() - begin0[2],'\n\n') – pippo1980 Feb 26 '21 at 11:10
  • but with the example file I am getting different results (i.e. fastest isnt always the same script, how big should the initial file to see consistent result ? Am I missing the right way to evaluate the speed of a script ? sorry to bother but I am on no more question ban (PS I voted for your aswer) – pippo1980 Feb 26 '21 at 11:10
  • Yes, you are on the right track when you mentioned how big the data should be. If you want to compare two codes, compare them on their weaker part. Try this, replicate OP's data say 500 times so its like around 10000 lines. And then compare the two algorithms. At this point I think, you should start seeing consistency with one algo always surpassing the other. (in terms of time requirements). – Rishabh Kumar Feb 26 '21 at 12:02
0

Use the csv library: https://docs.python.org/3/library/csv.html

Solution:

import csv

with open("spectroscopy.txt", newline="") as csvfile:
    reader = csv.reader(csvfile, delimiter=" ")
    column_A = []
    column_B = []
    for row in reader:
        try:
            column_A.append(float(row[0]))
            column_B.append(float(row[1]))
        except ValueError:
            pass

Alternative with pandas:

import pandas as pd

data = pd.read_csv("spectroscopy.txt", sep=" ", header=None, index_col=0)
0K9S
  • 36
  • 2
0
spect_list = []

spect_list_a =[]

spect_list_b =[]

with open('spect.txt') as f:
    for i in  f.readlines():            #read entire file as lines
        i = (i.rstrip('\n'))        #remove newlin character
        if i:                       #discard blank lines
            spect_list.append(i)
            spect_list_a.append(i.split()[0])
            spect_list_b.append(i.split()[1])
                 
print(spect_list)
print(spect_list_a)
print(spect_list_b)

you get python list with element as 'element' (with quotes) not sure is the right answer

got it :

use

spect_list_a.append(float(i.split()[0]))
spect_list_b.append(float(i.split()[1]))
pippo1980
  • 2,181
  • 3
  • 14
  • 30
0

Using a transposition trick and a parameter to auto-convert the columns to float. Also, skipinitialspace handles a couple of lines with two spaces between the values.

import csv

# The quoting value auto-converts numeric columns to float.
with open('input.csv',newline='') as f:
    r = csv.reader(f,delimiter=' ',quoting=csv.QUOTE_NONNUMERIC,skipinitialspace=True)
    data = list(r)

# transpose row/col data and convert to list (otherwise, it would be tuple)
col1,col2 = [list(col) for col in zip(*data)]
print(col1)
print(col2)
[397.451, 397.585, 397.719, 397.853, 397.987, 398.121, 398.256, 398.39, 398.524, 398.658, 398.792, 398.926, 399.06, 399.194, 399.328, 399.463, 399.597, 399.731, 399.865, 399.999]
[-48.38, -48.38, -48.38, -18.38, -3.38, 6.62, -0.38, -1.38, 7.62, 4.62, -4.38, 12.62, 5.62, -6.38, -6.38, 0.62, -6.38, -12.38, 1.62, 2.62]

Using pandas:

import pandas as pd
data = pd.read_csv('input.csv',sep=' ',skipinitialspace=True,header=None)
col1 = list(data[0])
col2 = list(data[1])
print(col1)
print(col2)

Using no imports:

with open('input.csv') as f:
    data = [[float(n) for n in row.split()] for row in f]
col1,col2 = [list(n) for n in zip(*data)]
print(col1)
print(col2)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251