0

I'm currently making an api with flask to upload a CSV file and upload the data in CSV to postgresql through sqlalchemy(using flask_sqlalchemy)

here the CSV file example:

text,phonetic_symbols,fgner_entity,user_id,data_type_id,sns_raw_data_id
hhhh,jj,kk,3,2,1
ll,hh,eee,3,2,1

When i first wrote the quote to upload through file it worked and the data got uploaded

Original code

@app.route('/upload', methods=['POST'])
def upload_csv():
    if 'file' not in request.files:
            return jsonify('No file part')
    file = request.files['file']
    if file.filename == '':
        return jsonify(msg='no file selected')
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
        with open(os.path.join(app.config['UPLOAD_FOLDER'], filename), 'r') as f:
            next(f)
            csv_file = csv.reader(f, delimiter=',')
            for line in csv_file:
                new_dictionary_item = DictionaryItem(text=line[0], phonetic_symbols=line[1], fgner_entity=line[2],
                                                user_id=line[3], data_type_id=line[4], sns_raw_data_id=line[5])
                db.session.add(new_dictionary_item)
                db.session.commit() 
            return jsonify("update sucess"), 200
    return jsonify(msg='not correct file type'), 400:

Now i need to add a condition to check if every lines in the CSV file has exactly 6 fields to upload data

I wrote it within an if condition to check. The condition worked and return the appropriate response but the data doesn't get upload to my database anymore

New code with "if" condition:

@app.route('/upload', methods=['POST'])
def upload_csv():
    if 'file' not in request.files:
            return jsonify('No file part')
    file = request.files['file']
    if file.filename == '':
        return jsonify(msg='no file selected')
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
        with open(os.path.join(app.config['UPLOAD_FOLDER'], filename), 'r') as f:
            next(f)
            csv_file = csv.reader(f, delimiter=',')
            if all(len(single_line) == 6 for single_line in csv_file):
                for line in csv_file:
                    new_dictionary_item = DictionaryItem(text=line[0], phonetic_symbols=line[1], fgner_entity=line[2],
                                                    user_id=line[3], data_type_id=line[4], sns_raw_data_id=line[5])
                    db.session.add(new_dictionary_item)
                    db.session.commit()
                return jsonify("update sucess"), 200
            return jsonify("not correct csv format"), 400
    return jsonify(msg='not correct file type'), 400

Response update sucess but data doesn't get uploaded

What am i doing wrong ? Thank

EDIT:

Thank shrumm for telling me what wrong with my code and helped me get it to work. I would suggest to use psycopg2 as it would be more optimize but my project already using sqlalchemy so this will do:

@app.route('/upload', methods=['POST'])
def upload_csv():
    if 'file' not in request.files:
            return jsonify('No file part')
    file = request.files['file']
    if file.filename == '':
        return jsonify(msg='No file selected')
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
        with open(os.path.join(app.config['UPLOAD_FOLDER'], filename), 'r') as f:
            csv_file = csv.reader(f, delimiter=',')
            next(f)
            if all(len(single_line) == 6 for single_line in csv_file):
                f.seek(0)
                next(f)
                for line in csv_file:
                    new_dictionary_item = DictionaryItem(text=line[0], phonetic_symbols=line[1], fgner_entity=line[2],
                                                            user_id=line[3], data_type_id=line[4], sns_raw_data_id=line[5])
                    db.session.add(new_dictionary_item)
                    db.session.commit()
                return jsonify("Update succeed"), 200
            else:
                return jsonify("Not correct csv format"), 400
    return jsonify(msg='Not correct file type'), 400
Linh Nguyen
  • 3,452
  • 4
  • 23
  • 67

1 Answers1

1

The reason your code is not working is because you are opening the file once, but reading the file twice. After your if statement, the next loop fails because you have already read the entire file. If you want this to work, you'll need to re-open the file again OR just use something like

csvdata = f.readlines()

and then loop on that to do your validation. That said - there are a lot of ways you can improve this code, it depends on how complex your CSV file upload can be. If it's always a few lines, then what you have above is fine. But if you can expect to receive thousands of lines in it and your database is busy, writing each line one by one is slow.

This is a nice article on how to use the copy command -> https://www.dataquest.io/blog/loading-data-into-postgres/

You'll need to use psycopg2 directly instead of sqlalchemy. I've pasted the relevant code below:

import psycopg2
conn = psycopg2.connect("host=localhost dbname=postgres user=postgres")
cur = conn.cursor()
with open('user_accounts.csv', 'r') as f:
# Notice that we don't need the `csv` module.
next(f) # Skip the header row.
cur.copy_from(f, 'users', sep=',')
conn.commit()

This writes all the records in the CSV file immediately to the database.

Another tip is that the database can validate the data for you. Add check constraints to the database, and specify which columns cannot be null. Then when you push the CSV in, Postgres will give you an error if it could not be uploaded. You can catch this error and respond with the appropriate http response. This would save you from having to loop through the file.

something like:

import psycopg2
conn = psycopg2.connect("host=localhost dbname=postgres user=postgres")
with conn.cursor()
    with open(os.path.join(app.config['UPLOAD_FOLDER'], filename), 'r') as f:
        next(f) # Skip the header row.
        try:
            cur.copy_from(f, 'users', sep=',')
            conn.commit()
            return jsonify("update sucess"), 200
        except:
            return jsonify("not correct csv format"), 400
# handle other errors here, could not open DB connection, could not open your file etc
shrumm
  • 151
  • 1
  • 6
  • Thank you i used your method of opening the file and read it again(which i admit isn't optimize but still work) and now it worked. Sadly i can't use the psycopg2 due to my project using sqlalchemy and alembic to interact with postgresql so 2 connection type would be not suitable. I will post my code using your suggested answer :) – Linh Nguyen Jun 20 '19 at 07:56
  • 1
    I read about the reading multiple times in a file and got this https://stackoverflow.com/questions/26294912/read-multiple-times-lines-of-the-same-file-python file.seek() also a good option instead of opening the file again – Linh Nguyen Jun 20 '19 at 09:00
  • cool, i didn't think of that. though it's probably more efficient to do the readlines approach, rather than re-reading the file twice. reading line by line makes more sense if the file is huge and you're worried about running out of memory. – shrumm Jun 20 '19 at 16:20