0

I am writing/storing data in MySql via python. If columns data in MySql contains "NAN" then how can I handle them. At the moment I know how to deal with empty spaces or None values but here I got stuck. the data looks like this

# file no 2
# some description line
#"Time","col.no","dat.va","all_val","graph_da","tem","vel","acc", "all.valu","rand"

"2016-02-25 14:50:00",0,12.94,20.88,17.87,"NAN",14,14,13.9,"NAN"
"2016-02-25 15:00:00",1,13.48,20.86,18,"NAN",13.4,13,13.2,"NAN"
"2016-02-25 15:10:00",2,13.89,20.84,18.07,18.03,12.8,12,12.6,13 
"2016-02-25 21:30:00",3,15.76,20.18,17.96,18.25,0.09,0,0.07,0.16
"2016-02-25 21:40:00",4,15.72,,17.91,,0.08,"NAN",0.09,0.18
"2016-02-25 21:50:00",5,15.66,20.07,17.86,18.17,0.03,"NAN",0.05,0.09
"2016-02-25 23:30:00",6,14.98,19.26,17.12,17.44,"NAN","NAN","NAN","NAN"

there are several thousands line which includes randomly this "NAN" therefore It would be great if somebody give me logic to deal, e.g whenever it comes across NAN, it put NAN or none. my code to deal with empty spaces and None is like this.

with open(filepath) as f:
    lines = f.readlines()
max_lines = len(lines)
for k, line in enumerate(lines):
    if k >= (int(skip_header_line) + int(index_line_number)):
        data_tmp = line.strip().split(',')

        strDate = data_tmp[0].replace("\"", "")
    strDate = strDate.split('.')[0]   
    timestamp = datetime.datetime.strptime(strDate, '%Y-%m-%d %H:%M:%S') 
    ts = calendar.timegm(timestamp.timetuple())           
  #  _ts = ts * 1000000 

        data_buffer = [ts] + data_tmp[1:]                                                                    
        data_buffer = [ts] + data_tmp[1:]
        data_buffer = [v if v is not "" and v is not " " else None for v in data_buffer]
        print data_buffer
        cursor.execute(add_data, data_buffer)
        cnx.commit()
        with open(marker_file, "w") as f:
            f.write(" ".join([ str(item[0]), str(data_tmp[0]), str(max_lines),
                   str(k-int(skip_header_line)+1) ]))
cursor.close()
cnx.close()

Any help or tip would be greatly appreciated because I try my best to find the solution but I found nothing useful . Thanx in advance for help.

Rio
  • 347
  • 3
  • 6
  • 20
  • 1
    Why not simply pre-process the entire csv or dat file, and replace `"NaN"` with empty_string? You will not even need to write a python script for that, just a simple text editor search-and-replace will work. When the data are imported to MySQL, they will be interpreted as null values, just like the null values that appear on line 4 in the sample data you provided in your question. – dreftymac Dec 07 '16 at 22:30
  • @dreftymac thanx for the useful tips. I want exactly this thing so that in pre process it replace "NAN" with empty string, but I am unable to make the relevant changes. Can you show where I can make those changes. I would be really grateful for this help. – Rio Dec 07 '16 at 22:35
  • 1
    This question already has an answer on stackoverflow. See e.g., http://stackoverflow.com/a/4128192/42223. Take a look at that first then come back here if you still need help. – dreftymac Dec 07 '16 at 22:48
  • @dreftymac thanx for help. I will give it a look and will come if I need help. thanx for the tip. – Rio Dec 07 '16 at 22:50
  • this does not have to be this complex. Please take a look at the following python script. https://github.com/dreftymac/public.lab/blob/master/temp/throwaway.py – dreftymac Dec 08 '16 at 00:23
  • @dreftymac thanx for the code. I try like this got this problem . ."2016-02-25 14:50:00",0,12.94,20.88,17.87,'None',14,14,13.9,'None' . and error is ............Data truncated for column 'tem' at row 1 it mean I am successfull in replacing NAN with none , but no clue why this error. I tried this code. ............................................................................................. for item in index_list.items(): with open(item[1]) as f: lines = f.readlines() lines = [rec.replace('"NAN"', 'None') for rec in lines] – Rio Dec 08 '16 at 00:41

1 Answers1

2

Don't write your own CSV parsing code. Use the module.

Apart from that you can use lambda functions to filter lists, as shown below. There's one function that removes empty lines and comment lines from the csv, and one that converts "NAN" to None on each row.

import csv
import calendar
from datetime import datetime

with open('foo.csv', newline='') as csvfile:
    data_lines = lambda l: l[0] != '#' and l.strip()
    NAN_2_None = lambda i: None if i == "NAN" else i

    csvreader = csv.reader(filter(data_lines, csvfile), delimiter=',', quotechar='"')
    for row in csvreader:
        dt = datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
        row[0] = calendar.timegm(dt.timetuple())
        data = map(NAN_2_None, row)

        # cursor.execute(add_data, data)
        print(list(data))

prints

[1456411800, '0', '12.94', '20.88', '17.87', None, '14', '14', '13.9', None]
[1456412400, '1', '13.48', '20.86', '18', None, '13.4', '13', '13.2', None]
[1456413000, '2', '13.89', '20.84', '18.07', '18.03', '12.8', '12', '12.6', '13 ']
[1456435800, '3', '15.76', '20.18', '17.96', '18.25', '0.09', '0', '0.07', '0.16']
[1456436400, '4', '15.72', '', '17.91', '', '0.08', None, '0.09', '0.18']
[1456437000, '5', '15.66', '20.07', '17.86', '18.17', '0.03', None, '0.05', '0.09']
[1456443000, '6', '14.98', '19.26', '17.12', '17.44', None, None, None, None]

However, it will probably be faster to bulk-load the csv with MySQL, think about pre-processing the file with a simple tool like sed and use that approach instead.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • I am extremely sorry for late reply. I was stuck in some other things. I applied your solution now and its works perfectly fine. I am accepting your answer. But I am thankful that you point this Bulk insertion the csv with MySql. I will give a try to use sed and do some research. I will come back if I am stuck with bulk insertion. – Rio Dec 09 '16 at 14:33
  • Okay, glad to hear that this worked for you. Follow-ups please as new questions, not as comments here. – Tomalak Dec 09 '16 at 14:35
  • Well... I can't stop you, but it's not nice to poke people directly. Also there's no guarantee that I could even help you with a mySQL issue. It's much better if you ask a self-contained question that stands on its own. People will come and try to answer it. – Tomalak Dec 09 '16 at 14:44
  • Thanks for the useful tips . I will follow these guidelines. – Rio Dec 09 '16 at 14:48