2

I have a CSV file which has 78000 columns. I am trying to select the columns 2-100, 102-200, and the last 300 columns. The rest of the columns need to be skipped.

I have used numpy.loadtxt to select range of columns:

numpy.loadtxt(input_file_name, delimiter=",", skiprows = 1, usecols=range(1,99))

How can we select blocks of columns doing something similar, like:

numpy.loadtxt(input_file_name, delimiter=",", skiprows = 1, usecols=(range(1,99),range(101,199),range(74999,77999)))
cs95
  • 379,657
  • 97
  • 704
  • 746
SGh
  • 41
  • 8
  • This is a general duplicate, but I've added a numpy solution to my answer which should be useful to know. If anyone else wants to answer the question, ping me and I'll reopen it, as long as the solution isn't linked in the duplicate (i.e., a numpy solution). – cs95 Jan 19 '18 at 09:13
  • Thanks for accepting! You can also upvote answers if they were useful, so please consider doing so. Thanks. – cs95 Jan 19 '18 at 09:58

1 Answers1

1

Use the numpy row selector, np.r_.

>>> np.r_[range(3), range(15, 18), range(100, 103)]

Or (using hpaulj's suggestion),

>>> np.r_[0:3, 15:16, 100:103]

array([  0,   1,   2,  15,  16,  17, 100, 101, 102])

For your code, this is how you'd call it -

numpy.loadtxt(
  input_file_name, 
  delimiter=",", 
  skiprows = 1, 
  usecols=np.r_[range(1, 99), range(101, 199), range(74999, 77999)]
)
cs95
  • 379,657
  • 97
  • 704
  • 746