6

I have a csv data file with a header indicating the column names.

xy   wz  hi kq
0    10  5  6
1    2   4  7
2    5   2  6

I run:

X = np.array(pd.read_csv('gbk_X_1.csv').values)

I want to get the column names:

['xy', 'wz', 'hi', 'kg']

I read this post but the solution provides me with None.

Divakar
  • 218,885
  • 19
  • 262
  • 358
ebrahimi
  • 912
  • 2
  • 13
  • 32
  • np.genfromtxt() and names=True option might help. See https://stackoverflow.com/questions/12336234/read-csv-file-to-numpy-array-first-row-as-strings-rest-as-float – dkato Dec 01 '17 at 07:41
  • I think you need `pd.read_csv('gbk_X_1.csv').columns.tolist()` – jezrael Dec 01 '17 at 07:46
  • Is your problem getting the structured array or getting the names out of the structured array? If the latter: `list(x.dtype.fields)`. – Paul Panzer Dec 01 '17 at 08:05
  • Yes, It is also possible to use: `X = np.genfromtxt('gbk_X_1.csv', dtype=float, delimiter=',', names=True) print(X.dtype.names)` – ebrahimi Dec 01 '17 at 09:33

2 Answers2

4

Let's assume your csv file looks like

xy,wz,hi,kq
0,10,5,6
1,2,4,7
2,5,2,6

Then use pd.read_csv to dump the file into a dataframe

df = pd.read_csv('gbk_X_1.csv')

The dataframe now looks like

df

   xy  wz  hi  kq
0   0  10   5   6
1   1   2   4   7
2   2   5   2   6

It's three main components are the

  • data which you can access via the values attribute

    df.values
    
    array([[ 0, 10,  5,  6],
           [ 1,  2,  4,  7],
           [ 2,  5,  2,  6]])
    
  • index which you can access via the index attribute

    df.index
    
    RangeIndex(start=0, stop=3, step=1)
    
  • columns which you can access via the columns attribute

    df.columns
    
    Index(['xy', 'wz', 'hi', 'kq'], dtype='object')
    

If you want the columns as a list, use the to_list method

df.columns.tolist()

['xy', 'wz', 'hi', 'kq']
piRSquared
  • 285,575
  • 57
  • 475
  • 624
4

Use the following code:

import re

f = open('f.csv','r')

alllines = f.readlines()
columns = re.sub(' +',' ',alllines[0]) #delete extra space in one line
columns = columns.strip().split(',') #split using space

print(columns)

Assume CSV file is like this:

xy   wz  hi kq
0    10  5  6
1    2   4  7
2    5   2  6
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ahmad
  • 906
  • 11
  • 27