1

Here is the code, output and raw csv file data, the dtypes are all object type from output, is there a way to recognize each column as string (and last column as float type)? Using Python 2.7 with miniconda.

Code,

import pandas as pd
sample=pd.read_csv('123.csv', sep=',',header=None)
print sample.dtypes

program output,

0    object
1    object
2    object
3    object

123.csv content,

c_a,c_b,c_c,c_d
hello,python,pandas,1.2

Edit 1,

sample = pd.read_csv('123.csv', header=None, skiprows=1,
    dtype={0:str, 1:str, 2:str, 3:str})
print sample.dtypes

0    object
1    object
2    object
3    object
dtype: object

Edit 2,

sample = pd.read_csv('123.csv', header=None, skiprows=1,
    dtype={0:str, 1:str, 2:str, 3:str})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('float32')
print sample.dtypes

c_a     object
c_b     object
c_c     object
c_d    float32

regards, Lin

Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 1
    Possible duplicate of [Specifying dtype with pandas.read\_csv](http://stackoverflow.com/questions/15210962/specifying-dtype-with-pandas-read-csv) – Merlin Aug 27 '16 at 00:56

1 Answers1

2

You have to use the argument dtype. And since you do not want the header, you must skip it with skiprows because the third element is not a float.

df = pd.read_csv('123.csv', header=None, skiprows=1,
            dtype={0:str, 1:str, 2:str, 3:float})

The output is:

       0       1       2    3
0  hello  python  pandas  1.2

EDIT:

To add a header with different types to your DataFrame, you can use:

df.columns = pd.Index(data=['c_a', 'c_b', 'c_d', 4.])

and the output is:

     c_a     c_b     c_d  4.0
0  hello  python  pandas  1.2
gabra
  • 9,484
  • 4
  • 29
  • 45
  • Thanks gabra, vote up, is there a way to specify data column name to be the same as header column name? In my example, I want to use `c_a,c_b,c_c,c_d` as 4 column names in pandas. – Lin Ma Aug 27 '16 at 03:10
  • 1
    But then your fourth column header `c_d` need to be a float, which is not. You would have to change it. Is that ok? – gabra Aug 27 '16 at 03:13
  • Yes, it is ok to me, could you show me the solution? – Lin Ma Aug 27 '16 at 03:14
  • 1
    Added to the answer. – gabra Aug 27 '16 at 03:21
  • Thanks gabra, vote up. So, it seems I can either use header name, or load as specific data type but without using header name directly -- but cannot achieve both goals? :) – Lin Ma Aug 27 '16 at 03:24
  • 1
    I am sorry, but what did you mean? This way you can get the fourth column and the fourth item in the header as float and all the others as string. – gabra Aug 27 '16 at 03:27
  • Sorry gabra, I mean there is no way I can name 4th column as its header name `c_d` as keep float type? – Lin Ma Aug 27 '16 at 03:32
  • 1
    Oh, Ok. No, you can't have `c_d` as a float. But you can have `4.0` as a float or a string. – gabra Aug 27 '16 at 03:36
  • Thanks gabra for the clarification, vote up. See my updated post with Edit 1 section, looks like your code does not work for me, the data types are still object other than str. I am using Python 2.7 in miniconda environment, btw. – Lin Ma Aug 27 '16 at 03:39
  • 1
    There is no problem. They are still considered as string. Take a look at this answer. http://stackoverflow.com/a/21020411/2029132 – gabra Aug 27 '16 at 03:42
  • Thanks gabra, I tried to convert type to float32 explicitly, and it seem it works? See my edit 2, is that code correct? – Lin Ma Aug 27 '16 at 03:51
  • 1
    Yes. But the `c_d` in your header is still a string and not a float. To check, type `df.columns`. – gabra Aug 27 '16 at 04:01
  • 1
    Thanks for all the help gabra, mark your reply as answer. Have a good weekend. :) – Lin Ma Aug 27 '16 at 04:24