1

I create now a database that will contain a lot of data that is unstructured. My database gets data through excel sheets, but the excel sheet contain some blank lines/rows(EE77, KK12) that I do not want to have in my database. So far, the program will stops where the blank lines starts(EE77), but I want the data from both FF888888 and GG121

This is my code:

from src.server.connectToDB import get_sql_conn
import pandas as pd

if __name__ == '__main__':
    cursor = get_sql_conn().cursor()

    excelFile = pd.ExcelFile('C:\\Users\\dw\\Source\Repos\\analyse\\data\\Test-nordpool.xlsx')

    a = ["A1", "A2"]
    for i in a:
        df = excelFile.parse(i)
        for key, rows in df.items():
            print("# Kolonne: ", "\n")

            columnInsertSql = "INSERT INTO DataSets (Hour, BlockBuyNet, BlockSell, RejectedBlockBuy, RejectedBlockSell, NetImports) VALUES("

            rowCounter = 1
            for key, column in rows.items():
                columnInsertSql += str(column)

                if rowCounter != len(list(rows.items())):
                    columnInsertSql += ", "

                rowCounter += 1

            columnInsertSql += ")"

            cursor.execute(columnInsertSql)
            print("SQL: " + columnInsertSql)
            cursor.commit()

Result

AA8 BB88    CC888   D88888  EE77    FF888888  KK12     GG121
9   99      999     9999    -       999999    -        1212
10  100     10000   100000  -      1000000    -        121212
11  111     11111   111111  -      1111111    -        1212121
12  122     12222   12222   -      1222222    -        12121212
13  133     13333   13333   -      1333333    -        121212121
14  144     14444   1444444 -      1444444    -         121212121
Ricardo Pontual
  • 3,749
  • 3
  • 28
  • 43

1 Answers1

0

I think maybe you could use the df.shape(), to find the number rows, and then use df.isnull().sum() which should give you the number of null values for each column. You could then compare the two finding out which columns just contain nulls and skip them in processing. Below is not the most elegant, but I believe it would give you the general idea.

import pandas as pd
import numpy as np

raw_data = { 'hi': [20, 19, 22, 21],
    'bye': [88, 92, 95, 70],
    'why': [np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(raw_data, columns = [ 'hi','bye','why'])

NumberOfRows = df.shape[0]
print(NumberOfRows)

for c in df.isnull().sum():
    if c == NumberOfRows:
        print('Do Something')
    else:
        print('Do Something Else')
DanB
  • 59
  • 3