20

Say I have a file myfile.txt containing:

1   2.0000  buckle_my_shoe
3   4.0000  margery_door

How do I import data from the file to a numpy array as an int, float and string?

I am aiming to get:

array([[1,2.0000,"buckle_my_shoe"],
[3,4.0000,"margery_door"]])

I've been playing around with the following to no avail:

a = numpy.loadtxt('myfile.txt',dtype=(numpy.int_,numpy.float_,numpy.string_))

EDIT: Another approach might be to use the ndarray type and convert afterwards.

b = numpy.loadtxt('myfile.txt',dtype=numpy.ndarray)

    array([['1', '2.0000', 'buckle_my_shoe'],
       ['3', '4.0000', 'margery_door']], dtype=object)
Lee
  • 29,398
  • 28
  • 117
  • 170
  • Just read the file to a string, split the string on each `\n` newline and explode those inners with the 3- and 2 spaces. Otherwise you can also just use Regular Expressions to find each lines and split them up (groups). –  Mar 18 '13 at 16:18
  • 2
    I think the more important question is what you're going to do with this data after it's imported. While you can use `numpy` to work with non-numerical data, if you want to do anything fun with it you're probably going to wind up reinventing bits of `pandas`.. – DSM Mar 18 '13 at 16:24
  • For more basic explanations you might want to look at http://stackoverflow.com/a/10940038/2062965 – strpeter Nov 24 '14 at 10:04

2 Answers2

19

Use numpy.genfromtxt:

import numpy as np
np.genfromtxt('filename', dtype= None)
# array([(1, 2.0, 'buckle_my_shoe'), (3, 4.0, 'margery_door')], 
#       dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '|S14')])
root
  • 76,608
  • 25
  • 108
  • 120
7

Pandas can do that for you. The docs for the function you could use are here.

Assuming your columns are tab separated, this should do the trick (adapted from this question):

df = DataFrame.from_csv('myfile.txt', sep='\t')
array = df.values # the array you are interested in
Community
  • 1
  • 1
mtth
  • 4,671
  • 3
  • 30
  • 36