0

I have a array to which I want to insert a column at the 0th position and fill the values starting with 0 till the length of the array.

import io
import numpy as np

data =io.StringIO("""
ID,1,2
5362,0.9,-0.4
485,-0.6,0.5
582,0.0,0.9
99,0.7,0.5
75,-0.4,0.5
474,0.3,0.8
594,-0.2,0.0
597,0.9,-0.3
124,0.7,0.6
635,0.8,0.9
""")
data = genfromtxt(data, delimiter=',', skip_header=1, dtype=np.float64)

Expected:

IDX,ID,1,2
0,5362,0.9,-0.4
1,485,-0.6,0.5
2,582,0.0,0.9
3,99,0.7,0.5
4,75,-0.4,0.5
5,474,0.3,0.8
6,594,-0.2,0.0
7,597,0.9,-0.3
8,124,0.7,0.6
9,635,0.8,0.9
Matt Hall
  • 7,614
  • 1
  • 23
  • 36
axay
  • 437
  • 5
  • 19
  • 1
    consider converting it to a Pandas dataframe, then insert your column and finally convert back to numpy array using to_numpy (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html) – exan Nov 21 '19 at 04:53
  • Actually I do not want to use pandas as I want to use this to run on gpu. So could this be done with numpy only? – axay Nov 21 '19 at 05:15
  • 1
    Yes many ways to do that... have a look here https://stackoverflow.com/questions/8486294/how-to-add-an-extra-column-to-a-numpy-array – exan Nov 21 '19 at 05:22
  • 1
    You can easily concatenate on a `np.arange(10)[:,None]` array. But the result will be all floats. For fast numeric calculations, numpy arrays have to have the same dtype through out. There are ways of mixing dtypes, but that slows down the calculation. Do those first 2 columns have to be in the same array as the float columns? – hpaulj Nov 21 '19 at 05:59
  • @hpaulj - I am having repeating 1st column values. So in order to give unique values, I want to add a column so I will have access to the exact row number for furthe processing – axay Nov 21 '19 at 06:13
  • What kind of processing requires integer columns and float ones? – hpaulj Nov 21 '19 at 06:24

2 Answers2

1

This is probably a job for pandas. NumPy is really intended for situations where the numbers in an array are all measurements of the same thing. And I'd also add that you might not really need these indices in NumPy, since you can already ask for the n-th row with NumPy's indexing. But you can have more or less what you want if you're prepared to compromise a bit:

data = data[1:]
idx = np.arange(data.shape[0]).reshape(-1, 1)
np.hstack([idx, data])

In the first line, I've sliced off the header, because NumPy arrays don't have column headings like this. That's a pandas thing.

In the second line I've made a 'column' of monotonically increasing indices. This is a bunch of ints for now, but not for long.

In the third line I've concatenated everything. Everything is floats now. You can't have one column of ints and 3 columns of floats... pandas again.

Matt Hall
  • 7,614
  • 1
  • 23
  • 36
1
In [110]: txt = """ 
     ...: ID,1,2 
     ...: 5362,0.9,-0.4 
     ...: 485,-0.6,0.5 
     ...: 582,0.0,0.9 
     ...: 99,0.7,0.5 
     ...: 75,-0.4,0.5 
     ...: 474,0.3,0.8 
     ...: 594,-0.2,0.0 
     ...: 597,0.9,-0.3 
     ...: 124,0.7,0.6 
     ...: 635,0.8,0.9 
     ...: """  

In [113]: data = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2)   
In [114]: data                                                                  
Out[114]: 
array([[ 5.362e+03,  9.000e-01, -4.000e-01],
       [ 4.850e+02, -6.000e-01,  5.000e-01],
       [ 5.820e+02,  0.000e+00,  9.000e-01],
       ...
       [ 6.350e+02,  8.000e-01,  9.000e-01]])


In [118]: data1 = np.concatenate([np.arange(data.shape[0])[:,None],data], axis=1)                                                                     
In [119]: data1                                                                 
Out[119]: 
array([[ 0.000e+00,  5.362e+03,  9.000e-01, -4.000e-01],
       [ 1.000e+00,  4.850e+02, -6.000e-01,  5.000e-01],
       [ 2.000e+00,  5.820e+02,  0.000e+00,  9.000e-01],
       [ 3.000e+00,  9.900e+01,  7.000e-01,  5.000e-01],
         ...
       [ 9.000e+00,  6.350e+02,  8.000e-01,  9.000e-01]])

creating 2 arrays, one of int id, the other float values

In [124]: ID = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2,usecols=[0],dtype=int)                                                     
In [126]: ID                                                                    
Out[126]: array([5362,  485,  582,   99,   75,  474,  594,  597,  124,  635])
In [127]: np.column_stack([np.arange(ID.shape[0]),ID])                          
Out[127]: 
array([[   0, 5362],
       [   1,  485],
       [   2,  582],
        ...
       [   9,  635]])
In [128]: data2 = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=2,usecols=[1,2])                                                          
In [129]: data2                                                                 
Out[129]: 
array([[ 0.9, -0.4],
       [-0.6,  0.5],
       [ 0. ,  0.9],
        ...
       [ 0.8,  0.9]])

Or as a structured array:

In [120]: data2 = np.genfromtxt(txt.splitlines(), delimiter=',',skip_header=1,na
     ...: mes=True, dtype=None)                                                 
In [121]: data2                                                                 
Out[121]: 
array([(5362,  0.9, -0.4), ( 485, -0.6,  0.5), ( 582,  0. ,  0.9),
       (  99,  0.7,  0.5), (  75, -0.4,  0.5), ( 474,  0.3,  0.8),
       ( 594, -0.2,  0. ), ( 597,  0.9, -0.3), ( 124,  0.7,  0.6),
       ( 635,  0.8,  0.9)],
      dtype=[('ID', '<i8'), ('1', '<f8'), ('2', '<f8')])

I could add another id column, and consolidate the float columns, but that can wait.

hpaulj
  • 221,503
  • 14
  • 230
  • 353