2

When using csvkit, I'm having trouble keeping character data from getting transformed to numeric data. For the example below, my first column gets transformed into an 'int'

Data: (test.csv)

"BG_ID_10","DisSens_2010","PrivateNeglect_2010"
"250250001001",0.506632168908,0.363523524561
"250250001004",0.346632168908,0.352456136352

Code snippet:

from csvkit import sql as csvkit_sql
from csvkit import table
from csv import QUOTE_NONNUMERIC

fh = open('test.csv', 'rb')

csv_table = table.Table.from_csv(f=fh,\
                        name='tname',\
                        delimiter=',',\
                        quotechar='"',\
                        snifflimit=0,\
                        )

for col in csv_table:
    print col.name, col.type

Output:

BG_ID_10 <type 'int'>
DisSens_2010 <type 'float'>
PrivateNeglect_2010 <type 'float'>

I have a working hack but would appreciate any help better parameters for the "from_csv" or alternative suggestions. (Note, after this step, the csvkit commands are used for generating Postgres create table statements.)

Working Hack:

char_col = csv_table[0] # get first column
char_col.type = unicode # change type
for idx, val in enumerate(char_col):  # force to unicode
    char_col[idx] = u'%s' % val
rprasad
  • 366
  • 3
  • 8

1 Answers1

1

You can add infer_types=False to your from_csv call. All types will become unicode:

BG_ID_10 <type 'unicode'>
DisSens_2010 <type 'unicode'>
PrivateNeglect_2010 <type 'unicode'>

But there's currently no way to specify the type without building Columns yourself.

Quentin Pradet
  • 4,691
  • 2
  • 29
  • 41
  • Thanks for the help! I'll build the column myself. (infer_types at the table level will help with another, unrelated piece of code:) – rprasad Mar 07 '16 at 14:39