0

I currently am facing an issue with loading CSV files directly from blob storage to our Azure SQL instance. I have the credentials correctly set up an able to access the files without any issue. However, when performing the BULK INSERT it fails at a certain row. I have tried the exact same file and the exact same BULK INSERT configuration on our local SQL server and it works without any issue.

    BULK INSERT dbo.this_is_a_table FROM
        'transformed/this_is_a_csv.csv'
    WITH (
        DATA_SOURCE = 'this_is_a_blob_storage_data_source'
        ,FORMAT = 'CSV' 
        ,FIELDTERMINATOR = '|' 
        ,FIELDQUOTE = '"'
        ,ROWTERMINATOR = '0x0a' 
        ,BATCHSIZE=100000 
        ,TABLOCK
        ,FIRSTROW = 2
        ,CODEPAGE = '65001'
    )

I have tried altering the FORMAT to DATAFILETYPE = 'char', tried rotating the ROWTERMINATORS, etc. without any luck so far.

The error is as follows:

Failed to execute query. Error: Bulk load failed due to invalid column value in CSV data file transformed/this_is_a_csv.csv in row 3128, column 10. The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

Loading the data until MAXROW 3127 works perfectly fine. I checked the CSV file and there is nothing that stands out - basically identical to any other row in the file.

Any ideas or suggestions on what else I could try?

Sample data:

"MANDT"|"EBELN"|"BUKRS"|"BSTYP"|"BSART"|"BSAKZ"|"LOEKZ"|"STATU"|"AEDAT"|"ERNAM"|"LIFNR"|"EKORG"|"WAERS"|"BEDAT"|"FRGGR"|"FRGSX"|"FRGKE"|"FRGZU"|"FRGRL"|"ABSGR"|"PROCSTAT"
"100"|"4620739302"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000001"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
"100"|"5729302783"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000002"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
"100"|"5720562944"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000003"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
"100"|"8401817494"|"1000"|"F"|"UB"|""|""|"L"|"20230118"|"84HD937J3J392H371842CAZEF91374923HDTAZ32"|"0001000004"|"AB01"|"USD"|"20230118"|"A1"|"S1"|"F"|"XX"|""|"00"|"05"
fschopper
  • 1
  • 1
  • 1
    If you create a copy of the CSV file and remove that row, now proceed to process that copy of the file, what happen? Does the file get processed successfully or it fails on the next row? – Alberto Morillo Aug 04 '23 at 21:35
  • 1
    Have you actually looked at that row in a hex editor/viewer? Sounds like there's an invalid byte sequence on that row. A possibility comes to mind: have you been concatenating UTF-8 files together? Could be there's a UTF-8 BOM (Byte Order Mark) on the row that's confusing the reader, BOMs should only appear at the very start of a file. – AlwaysLearning Aug 04 '23 at 23:15
  • can you provide sample input of rows which getting an error and not getting an error? – Pratik Lad Aug 05 '23 at 09:24
  • @AlbertoMorillo I tried removing both the faulty row as well as the row prior. It then fails at the exact same row again, regardless of which on I removed. – fschopper Aug 05 '23 at 18:22
  • @PratikLad Added an example to the post. The insert fails on the second row. – fschopper Aug 05 '23 at 18:23
  • @AlwaysLearning No, the file has been created as a single file. I've tried reading in the file using python and re-exporting it again using various encodings. No luck in doing so. Pandas is able to read the file without any issues. – fschopper Aug 05 '23 at 18:27
  • Can you split the CSV file in chuncks of 3,000 rows and process each of the smaller files generated by the split operation?https://stackoverflow.com/questions/20721120/how-to-split-csv-files-as-per-number-of-rows-specified – Alberto Morillo Aug 05 '23 at 19:30

1 Answers1

0

I also tried to load CSV file from storage account to azure SQL server with your sample data and getting similar error.

enter image description here

The error can cause for many reasons like:

  • Incorrect Row Terminators or Field Terminators.
  • Inappropriate size of table columns.

To resolve this error, I check all the size of the columns is align with incoming data and datatypes are also correct for each column. and changed ROWTERMINATOR = '\n'.

BULK INSERT sample2
FROM 'csv1.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage'
        ,FORMAT = 'CSV' 
        ,FIELDTERMINATOR = '|' 
        ,FIELDQUOTE = '"'
        ,ROWTERMINATOR = '\n' 
        ,BATCHSIZE=100000 
        ,TABLOCK
        ,FIRSTROW = 0
        ,CODEPAGE = '65001');

OUTPUT:

enter image description here

Pratik Lad
  • 4,343
  • 2
  • 3
  • 11
  • Thanks for the response! A quick search of the raw file gives the following "\n\r" - 0 hits, "\r\n" - 0, "\r" - 0 hits, "\n" - 21168 hits. I have set the datatype and field lengths all to NVARCHAR(MAX) (also tried e.g. VARCHAR(1000) just in case) and ensured that all columns are correctly created and in the correct order (should matter in case of NVARCHAR(MAX) though). Still getting the same error. Also tried '0x0A' instead of '\n' without any luck. Slowly but steadily running out of ideas here.. – fschopper Aug 07 '23 at 19:04