The csv
module can easily give you the column names from the first line, and then the values from the other ones. The hard part will be do guess the correct column types. When you load a csv file into an Excel worksheet, you only have few types : numeric, string, date.
In a database like MySQL, you can define the size of string columns, and you can give the table a primary key and eventually other indexes. You will not be able to guess that part automatically from a csv file.
At the simplest way, you can treat all columns as varchar(255)
. It is really uncommon to have fields in a csv file that do not fit in 255 characters. If you want something more clever, you will have to scan the file twice : first time to control the maximum size for each colum, and at the end, you could take the minimum power of 2 greater than that. Next step would be to control if any column contains only integers or floating point values. It begins to be harder to do that automatically, because the representation of floating point values may be different depending on the locale. For example 12.51
in an english locale would be 12,51
in a french locale. But Python can give you the locale.
The hardest thing would be eventual date or datetime fields, because there are many possible formats only numeric (dd/mm/yyyy
or mm/dd/yy
) or using plain text (Monday, 29th of september
).
My advice would be to define a default mode, for example all string, or just integer and strings, and use configuration parameters or even a configuration file to finely tune conversion per column.
For the reading part, the csv module
will give you all what you need.