0

Below is the content from myfile.csv

  1st        2nd     3rd      4th                     5th
2061100   10638650  -8000     25         [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
2061800   10639100  -8100     26         [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061150   10638750  -8250     25         [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061650   10639150  -8200     25         [4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]
2061350   10638800  -8250     3          [5.0, 5.0, 5.0]
2060950   10638700  -8000     1          [1.0]
2061700   10639100  -8100     11         [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
2061050   10638800  -8250     6          [3.0, 3.0, 3.0, 3.0, 3.0, 3.0]
2061500   10639150  -8200     1          [4.0]
2061250   10638850  -8150     16         [5.0, 5.0, 5.0, 5.0]

My code:

from numpy import genfromtxt
mydata = genfromtxt('myfile.csv', delimiter=',')
arr = np.array(mydata)
col5 = arr[:,4]

I want to read the 5th column from the csv file. However, the element in 5th column is list not value.

How can I revise my code?

Bach
  • 6,145
  • 7
  • 36
  • 61
Jeremy_Tamu
  • 725
  • 1
  • 8
  • 21
  • You are calling it a CSV file I see no comma delimiters. What are the actual delimiters? – merlin2011 Jun 18 '14 at 21:20
  • Do you want to read the 5th column as a list or do you want to create a column for each value in the list ? – DavidK Jun 18 '14 at 21:20
  • See the `csv`module for python. And then this: http://stackoverflow.com/questions/20363395/what-is-the-fastest-way-to-convert-string-to-array-in-python. – DrV Jun 18 '14 at 21:21
  • @DavidK I want to read the 5th column as a list and then read all elements from the list and do calculation – Jeremy_Tamu Jun 18 '14 at 21:23
  • @merlin2011: 'CSV file' has become a generic name for any fixed-width format (FWF), whether it has explicit separator chars or not. – smci Sep 18 '14 at 00:41

3 Answers3

2

Use pandas to read your csv file and then slice the column. Your [] is being taken as NaN. So avoid NaN. Example below (I have only few rows but it's the same for your whole data):

 >>>import pandas as pd
 >>>import numpy
 >>>df = pd.read_csv("stack.csv",header=None,na_values=" NaN")
 >>> df
            0         1     2   3      4   5   6   7   8   9      10
            0   206110  10638650 -8000  25   [1.0   1   1   1   1   1   1.0]
            1  2061800  10639100 -8100  26   [2.0   2   2   2   2   2   2.0]
            2  2061150  10638750 -8250  25   [3.0   3   3   3   3   3   3.0]
            3  2061650  10639150 -8200  25   [4.0   4   4   4   4   4   4.0]
 >>> x = df.ix[:,4:10]
 >>> x
             4   5   6   7   8   9      10
         0   [1.0   1   1   1   1   1   1.0]
         1   [2.0   2   2   2   2   2   2.0]
         2   [3.0   3   3   3   3   3   3.0]
         3   [4.0   4   4   4   4   4   4.0]
  >>> x = numpy.array(x)
  >>> x
      array([['[1.0', 1.0, 1.0, 1.0, 1.0, 1.0, ' 1.0]'],
             [' [2.0', 2.0, 2.0, 2.0, 2.0, 2.0, ' 2.0]'],
              ['[3.0', 3.0, 3.0, 3.0, 3.0, 3.0, ' 3.0]'],
              [' [4.0', 4.0, 4.0, 4.0, 4.0, 4.0, ' 4.0]']], dtype=object)
user2567857
  • 483
  • 7
  • 25
1

In your posted file, it has multiple delimiters, which will be hard for python to detect which column do you exactly want to get, I think that to reach your final goal, why not get a raw output and process it to get your final result. Can you just filter all 5th column to a file, output.txt? input.txt is what you posted, If so:

awk -F" " 'BEGIN{OFS=""}{$1="";$2="";$3="";$4=""; print}' input.txt > output.txt

    output:
    [1.0,1.0,1.0,1.0,1.0,1.0,1.0]
    [2.0,2.0,2.0,2.0,2.0,2.0,2.0]
    [3.0,3.0,3.0,3.0,3.0,3.0,3.0]
    [4.0,4.0,4.0,4.0,4.0,4.0,4.0]
    [5.0,5.0,5.0]
    [1.0]
    [2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0]
    [3.0,3.0,3.0,3.0,3.0,3.0]
    [4.0]
    [5.0,5.0,5.0,5.0]
Haifeng Zhang
  • 30,077
  • 19
  • 81
  • 125
  • Thx for your answer. However, I want to read the 5th column as a list and then read all elements from the list to do further calculation – Jeremy_Tamu Jun 18 '14 at 21:27
  • @PythonFun im a python fun as well, check it here, it may help http://stackoverflow.com/questions/4998629/python-split-string-with-multiple-delimiters – Haifeng Zhang Jun 18 '14 at 21:29
0

To read all the elements of the 5th column and do calculations, you can do on each of them :

v = eval(v) 


In [19]: v = '[1,2,3]'

In [20]: eval(v)
Out[20]: [1, 2, 3]
DavidK
  • 2,495
  • 3
  • 23
  • 38