I have some code that reads a file of tab separated values (tsv) that is working fine when the first column is a number, but fails when it's a string.
import os
import numpy as np
input_file = os.path.normpath('C:/Users/sturaroa/Documents/PycharmProjects/my_file.tsv')
# read values from file, by column
my_data = np.genfromtxt(input_file, delimiter='\t', skip_header=0)
print('my_data\n' + str(my_data))
groups = my_data[:, 0] # 1st column
X = my_data[:, 1] # 2nd column
Y = my_data[:, 2] # 3rd column
errors = my_data[:, 3] # 4th column (errors)
print('\ngroups ' + str(groups) + '\nX ' + str(X) + '\nY ' + str(Y) + '\nerrors ' + str(errors))
This is the file content (tab separated)
2.4 2 4.0 0.0
2.4 4 8.210526 0.7254761
2.9 4 8.4 0.8081221
2.9 6 12.52 1.0544369
The program prints this
my_data
[[ 2.4 2. 4. 0. ]
[ 2.4 4. 8.210526 0.7254761]
[ 2.9 4. 8.4 0.8081221]
[ 2.9 6. 12.52 1.0544369]]
groups [ 2.4 2.4 2.9 2.9]
X [ 2. 4. 4. 6.]
Y [ 4. 8.210526 8.4 12.52 ]
errors [ 0. 0.7254761 0.8081221 1.0544369]
I've seen this question suggesting to use dtype=None
. However, if I do that, I get this error
Traceback (most recent call last):
File "C:/Users/sturaroa/Documents/PycharmProjects/2d_plot_test.py", line 11, in <module>
groups = my_data[:, 0] # 1st column
IndexError: too many indices for array
I need to adjust my code to work with an input like this
something 2 4.0 0.0
something 4 8.210526 0.7254761
some_other_thing 8.4 0.8081221
some_other_thing 12.52 1.0544369
This first column is a string of variable length, the other columns are numbers (int or float).
I'm using numpy 1.9.2 on Python 2.7.