1

So i have a text document with a lot of values from calculations. I have extracted all the data and stored it in an array, but they are not numbers that I can use for anything. I want to use the number to plot them in a graph, but the elements in the array are text-strings, how would i turn them into numbers and remove unneccesary signs like commas and n= for instance?

Here is code, and under is my print statement.

import numpy as np



['n=1', 'n=2', 'n=3', 'n=4', 'n=5', 'n=6', 'n=7', 'n=8', 'n=9', 'n=10', 'n=11', 'n=12', 'n=13', 'n=14', 'n=15', 'n=16', 'n=17', 'n=18', 'n=19'])
puhhy ahh
  • 63
  • 4

3 Answers3

1

From the given array in the question, If you would like to remove the 'n=' and convert each element to an integer, you may try the following.

import numpy as np

array = np.array(['n=1', 'n=2', 'n=3', 'n=4', 'n=5', 'n=6', 'n=7', 'n=8', 'n=9', 
'n=10', 'n=11', 'n=12', 'n=13', 'n=14', 'n=15', 'n=16', 'n=17', 'n=18', 'n=19'])

array = [int(i.replace('n=', '')) for i in array]

print(array)
Destroyer-byte
  • 121
  • 1
  • 4
0

I'd use the conversion method presented in this post within the extract function, so e.g.

...
delta_x.append(strtofloat(words[1]))
...

where you might as well do the conversion inline (my strtofloat is a function you'd have to write based on mentioned post) and within a try/except block, so failed conversions are just ignored from your list.

To make it more consistent, any conversion error should discard the whole line affected, so you might want to use intermediate variables and a check for each field.

Btw. I noticed the argument to the extract function, it would seem logical to make the argument a string containing the file name from which to extract the data?

EDIT: as a side note, you might want to look into pandas, which is a library specialised in numerical data handling. Depending on the format of your data file there are probably standard functions to read your whole file into a DataFrame (which is a kind of super-charged array class which can handle a lot of data processing as well) in a single command.

ppenguin
  • 155
  • 1
  • 11
0

I would consider using regular expression:

import re
match_number = re.compile('-?[0-9]+\.?[0-9]*(?:[Ee]-?[0-9]+)?')
for line in infile:
    words = line.split()

    new_delta_x = float(re.search(match_number, words[1]).group())
    new_abs_error = float(re.search(match_number, words[7]).group())
    new_n = int(re.search(match_number, words[10]).group())

    delta_x.append(new_delta_x)
    abs_error.append(new_abs_error)
    n.append(new_n)

But it seems like your data is already in csv format. So try using pandas. Then read data into dataframe without header (column names will be integers).

import numpy as np
import pandas as pd
df = pd.read_csv('approx_derivative_sine.txt', header=None)
delta_x = df[1].to_numpy()
abs_error = df[7].to_numpy()
# if n is always number of the row
n = df.index.to_numpy(dtype=int)
# if n is always in the form 'n=<integer>'
n = df[10].apply(lambda x: x.strip()[2:]).to_numpy(dtype=int)

If you could post a few rows of your approx_derivative_sine.txt file, that would be useful.

Mark
  • 532
  • 2
  • 6