I want to read a csv file with string type for specified column, the data file located here:
Please download and save it as $HOME\cbond.csv
(can't upload it into dropbox and other net disk because of GFW, jianguoyun provide english gui, create your own free account and download my sample data file).
import pandas as df
df = pd.read_csv('cbond.csv',sep=',',header=0, converters={'正股代码':str})
I make the column 正股代码
in csv file as string type with converters,check all columns data type with df.info()
.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239 entries, 0 to 238
Data columns (total 17 columns):
代码 239 non-null int64
转债名称 239 non-null object
现价 239 non-null float64
涨跌幅 239 non-null float64
正股名称 239 non-null object
正股价 239 non-null float64
正股涨跌 239 non-null float64
转股价 239 non-null float64
回售触发价 239 non-null float64
强赎触发价 239 non-null float64
到期时间 239 non-null object
剩余年限 239 non-null float64
正股代码 239 non-null object
转股起始日 239 non-null object
发行规模 239 non-null float64
剩余规模 239 non-null object
转股溢价率 239 non-null float64
dtypes: float64(10), int64(1), object(6)
Why the column 正股代码
is shown as
正股代码 239 non-null object
instead of
正股代码 239 non-null string
?
Upgrade pandas:
sudo apt-get install --upgrade python3-pandas
Reading package lists... Done
Building dependency tree
Reading state information... Done
python3-pandas is already the newest version (0.19.2-5.1).
Try different statements:
>>> import pandas as pd
>>> pd.__version__
'0.24.2'
>>> test_1 = pd.read_csv('cbond.csv',dtype={'正股代码':'string'})
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pandas/core/dtypes/common.py", line 2011, in pandas_dtype
npdtype = np.dtype(dtype)
TypeError: data type "string" not understood
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 429, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 895, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1122, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1853, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 490, in pandas._libs.parsers.TextReader.__cinit__
File "/usr/local/lib/python3.5/dist-packages/pandas/core/dtypes/common.py", line 2017, in pandas_dtype
dtype))
TypeError: data type 'string' not understood
>>> test_2 = pd.read_csv('cbond.csv',dtype={'正股代码':'str'})
>>> test_2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239 entries, 0 to 238
Data columns (total 17 columns):
代码 239 non-null int64
转债名称 239 non-null object
现价 239 non-null float64
涨跌幅 239 non-null float64
正股代码 239 non-null object
正股名称 239 non-null object
正股价 239 non-null float64
正股涨跌 239 non-null float64
转股价 239 non-null float64
回售触发价 239 non-null float64
强赎触发价 239 non-null float64
到期时间 239 non-null object
剩余年限 239 non-null float64
转股起始日 239 non-null object
发行规模 239 non-null float64
剩余规模 239 non-null object
转股溢价率 239 non-null float64
dtypes: float64(10), int64(1), object(6)
memory usage: 31.8+ KB