Reading the file line by line and saving each in a list is equivalent to making a list of strings:
In [98]: txt='''0 1.2 2 3.1
...: 20 21.2 22 23
...: 30 31 32 33.01'''.splitlines()
In [99]: txt
Out[99]: ['0 1.2 2 3.1', '20 21.2 22 23', '30 31 32 33.01']
Making an array from that just produces a 1d array of strings. That can't be indexed as a 2d numeric array:
In [100]: np.array(txt)
Out[100]: array(['0 1.2 2 3.1', '20 21.2 22 23', '30 31 32 33.01'], dtype='<U14')
If you first split the lines into substrings:
In [101]: [line.split() for line in txt]
Out[101]:
[['0', '1.2', '2', '3.1'],
['20', '21.2', '22', '23'],
['30', '31', '32', '33.01']]
In [102]: np.array([line.split() for line in txt], dtype=float)
Out[102]:
array([[ 0. , 1.2 , 2. , 3.1 ],
[20. , 21.2 , 22. , 23. ],
[30. , 31. , 32. , 33.01]])
This is a 2d array.
We can select a column from that array. But note that the result is a 1d array:
In [104]: np.array([line.split() for line in txt], dtype=float)[:,1]
Out[104]: array([ 1.2, 21.2, 31. ])
Don't worry about this being a 'row' or 'column'. We could change shape to (1,3) or (3,1), but for most numpy
purposes a 1d shape, (3,) is just as good.
numpy
has good csv
loader (two actually):
In [105]: np.genfromtxt(txt)
Out[105]:
array([[ 0. , 1.2 , 2. , 3.1 ],
[20. , 21.2 , 22. , 23. ],
[30. , 31. , 32. , 33.01]])
Usually genfromtxt
is given a file name, but it works fine anything that feeds it lines, such as the txt
list.
And if you are interested in just one column, you can specify it with usecols
(there are many more possible parameters):
In [106]: np.genfromtxt(txt, usecols=1)
Out[106]: array([ 1.2, 21.2, 31. ])
genfromtxt
isn't the fastest loader. It you need more speed we usually recommend the pandas
loaded. pandas
has some fancier handling of quotes and missing values, but you don't seem to need that here. numpy
users don't seem to use the csv
module much; it may be it just isn't needed.
If you really must have a column vector
, here's how to use reshape
:
In [110]: col1 = np.genfromtxt(txt, usecols=1)
In [111]: col1
Out[111]: array([ 1.2, 21.2, 31. ])
In [112]: col1.reshape(3,1)
Out[112]:
array([[ 1.2],
[21.2],
[31. ]])
Let's build the array line by line:
In [116]: data = []
In [117]: for line in txt:
...: arr = np.array(line.split(), dtype=float)
...: print(arr.shape)
...: data.append(arr)
...: print(data)
...:
...:
(4,)
[array([0. , 1.2, 2. , 3.1])]
(4,)
[array([0. , 1.2, 2. , 3.1]), array([20. , 21.2, 22. , 23. ])]
(4,)
[array([0. , 1.2, 2. , 3.1]), array([20. , 21.2, 22. , 23. ]), array([30. , 31. , 32. , 33.01])]
data
is now a list of arrays:
In [118]: data
Out[118]:
[array([0. , 1.2, 2. , 3.1]),
array([20. , 21.2, 22. , 23. ]),
array([30. , 31. , 32. , 33.01])]
Join those arrays into one array:
In [119]: np.array(data)
Out[119]:
array([[ 0. , 1.2 , 2. , 3.1 ],
[20. , 21.2 , 22. , 23. ],
[30. , 31. , 32. , 33.01]])
(Check the shape after doing this. If the individual arrays differ in shape, the result will not be a 2d array; it will be a 1d array of arrays, which is closer to the original list of arrays.)