Various ways of loading this text
In [470]: txt=b"""THE 77534223
...: AND 30997177
...: ING 30679488
...: ENT 17902107
...: ION 17769261
...: HER 15277018
...: FOR 14686159
...: THA 14222073
...: NTH 14115952"""
Let genfromtxt
deduce the correct column dtype
In [471]: data = np.genfromtxt(txt.splitlines(),dtype=None)
In [472]: data
Out[472]:
array([(b'THE', 77534223), (b'AND', 30997177), (b'ING', 30679488),
(b'ENT', 17902107), (b'ION', 17769261), (b'HER', 15277018),
(b'FOR', 14686159), (b'THA', 14222073), (b'NTH', 14115952)],
dtype=[('f0', 'S3'), ('f1', '<i4')])
Not the right dtype specification; like yours but with just 1 char per element.
In [473]: data = np.genfromtxt(txt.splitlines(),dtype=(str, int))
In [474]: data
Out[474]:
array([['T', '7'],
['A', '3'],
['I', '3'],
['E', '1'],
['I', '1'],
['H', '1'],
['F', '1'],
['T', '1'],
['N', '1']],
dtype='<U1')
A little better - but the strings are too short
In [475]: data = np.genfromtxt(txt.splitlines(),dtype='str,int')
In [476]: data
Out[476]:
array([('', 77534223), ('', 30997177), ('', 30679488), ('', 17902107),
('', 17769261), ('', 15277018), ('', 14686159), ('', 14222073),
('', 14115952)],
dtype=[('f0', '<U'), ('f1', '<i4')])
Similar to the dtype=None
case
In [477]: data = np.genfromtxt(txt.splitlines(),dtype='U10,int')
In [478]: data
Out[478]:
array([('THE', 77534223), ('AND', 30997177), ('ING', 30679488),
('ENT', 17902107), ('ION', 17769261), ('HER', 15277018),
('FOR', 14686159), ('THA', 14222073), ('NTH', 14115952)],
dtype=[('f0', '<U10'), ('f1', '<i4')])