create database by load a csv files using the header as columnnames (and add a column that has the filename as a name)

Question

I have CSV files that I want to make database tables from in mysql. I've searched all over and can't find anything on how to use the header as the column names for the table. I suppose this must be possible. In other words, when creating a new table in MySQL do you really have to define all the columns, their names, their types etc in advance. It would be great if MySQL could do something like Office Access where it converts to the corresponding type depending on how the value looks.

I know this is maybe a too broadly defined question, but any pointers in this matter would be helpful. I am learning Python too, so if it can be done through a python script that would be great too.

Thank you very much.

[Here](http://stackoverflow.com/questions/2887878/importing-a-csv-file-into-a-sqlite3-database-table-using-python) is an answer on how to create a db/tables from CSV file using SQLite. It may be a good start. — Mauro Baraldi, Sep 29 '14 at 20:38

score 1 · Answer 1 · edited May 23 '17 at 12:17

Using Python, you could use the csv DictReader module to makes it pretty easy to use the headers from the csv files as labels for the input data. It basically reads all lines in as a dictionary object with the keys as the headers, so you can use the keys as the source for your column names when accessing mySQL.

A quick example that reads a csv into a list of dictionaries:

example.csv:

name,address,city,state,phone
jack,111 washington st, somewhere, NE, 888-867-5309
jill,112 washington st, somewhere else, NE, 888-867-5310
john,113 washington st, another place, NE, 888-867-5311

example.py:

import csv

data = []
with open("example.csv") as csvfile:
    reader = csv.DictReader(csvfile)
    for line in reader:
        data.append(line)

print(data[0].keys())
print(data[0]['address'])
print(data[1]['name'])
print(data[2]['phone'])

output:

$:python example.py
dict_keys(['name', 'address', 'city', 'state', 'phone'])
111 washington st
jill
 888-867-5311

More in-depth examples at: http://java.dzone.com/articles/python-101-reading-and-writing

Some info on connection to MySQL in Python: How do I connect to a MySQL Database in Python?

I've been thinking and reading about how to go from such list of dictionaries (using DictReader) to filling out (and optionally, creating) a SQL table.. I've been trying a different way now (see other sql related question), but running into syntax errors. Any more clues. Thank you. — oaklander114, Oct 05 '14 at 17:20
Here is one example question for a similar problem, there is probably lots of similar questions answering your question: http://stackoverflow.com/questions/9336270/using-a-python-dict-for-a-sql-insert-statement You can use the .keys() and .items() method to return a view object of the keys or key,value pairs in the dictionary. Both of these are iterable, so you can iterate over them by going `for key in ex.keys():` or `for key,value in ex.items():`. From there it is a matter of parsing the values into the proper format for your mySQL statements. You'll have to give examples for more help. — Serdmanczyk, Oct 08 '14 at 14:39

score 0 · Accepted Answer · answered Sep 29 '14 at 20:43

The csv module can easily give you the column names from the first line, and then the values from the other ones. The hard part will be do guess the correct column types. When you load a csv file into an Excel worksheet, you only have few types : numeric, string, date.

In a database like MySQL, you can define the size of string columns, and you can give the table a primary key and eventually other indexes. You will not be able to guess that part automatically from a csv file.

At the simplest way, you can treat all columns as varchar(255). It is really uncommon to have fields in a csv file that do not fit in 255 characters. If you want something more clever, you will have to scan the file twice : first time to control the maximum size for each colum, and at the end, you could take the minimum power of 2 greater than that. Next step would be to control if any column contains only integers or floating point values. It begins to be harder to do that automatically, because the representation of floating point values may be different depending on the locale. For example 12.51 in an english locale would be 12,51 in a french locale. But Python can give you the locale.

The hardest thing would be eventual date or datetime fields, because there are many possible formats only numeric (dd/mm/yyyy or mm/dd/yy) or using plain text (Monday, 29th of september).

My advice would be to define a default mode, for example all string, or just integer and strings, and use configuration parameters or even a configuration file to finely tune conversion per column.

For the reading part, the csv module will give you all what you need.

Thanks. I've been trying to understand how to go about it but started doing it another way, first creating the database table with all the Fields as VARCHAR(255) and then trying to populate the table by the "INSERT INTO table SET" syntax. I had to open another question about it because I ran into issues but it's a different topic as this where I want to do the task by loading from a CSV file. Any more specifics about how to go about it would be welcome though. Thank you. — oaklander114, Oct 05 '14 at 06:10
@user3524881 : if you have no more problems about the current question, could you accept one answer to notice you no longer need help about it. — Serge Ballesta, Oct 05 '14 at 08:41

create database by load a csv files using the header as columnnames (and add a column that has the filename as a name)

2 Answers2