How to remove random breaks from flat files being imported (SSIS)

Question

I'm trying to import a flat file directly into one of our databases through the import and export wizard, but the source file has random breaks in it.

for example

Column 1 | column 2 | column 3 | column 4

is the way it should be structured

but the source file has problem areas where it randomly breaks a line like so

column input 1 | column input 2 | 
(empty row)
column input 3 | column input 4 ...

Results

column input 1 | column input 2 | Blank | Blank column input 3 | column input 4 ...

And this is causing mostly everything to feed into the table incorrectly. I've seen some potential solutions to this by doing conditional splits, but not that can account for the fact that the breaks are not consistent in which column they would start on. The break could start on row 10 could start on row 352, and the source file is simply too big to fix this manually.

I'm also occasionally getting a data flow error where "The column delimiter for column 'CompensatingFactors' was not found."

I've tried fixing with conditional split in SSIS
Adjusting the type of LF / CRLF
Adjusting the Code Page from 65001 to 1252
My boss keeps suggesting changing the variable types, but this hasn't helped

Is there anything characteristic of the ends of the incomplete rows that is not found on the complete rows, such as a trailing space? If there is such a feature, then sed, awk, Vim, or another tool of your choice could process the flat file and join the broken rows. — rd_nielsen, Oct 09 '19 at 16:58
The only way in the past I have seen to fix this and its subject to how you get the original deliminated file is to have the flat-file qualify your fiels with ". This helps the parser then know that a new line is part of the field and not a new field altogether. With that said. If your data also contains " in its fields then this may also cause problems. — LinkOps, Oct 09 '19 at 17:06
@CurtisRhoa Is it qualified on export into the flat file or are you only setting the qualifier upon import with SSIS? It needs to be qualified on export as well — LinkOps, Oct 09 '19 at 17:36
Looks like the new line characters should be treated as a part of a value. BCP can be configured to do that. In the past I used a script source and parsed such files in C#. — Piotr Palka, Oct 09 '19 at 18:16

score -1 · Answer 1 · answered Oct 09 '19 at 17:19

-1

One simple technique is to import the flat file, line by line, rather than column by column. Then process the resulting table, dropping empty rows and splitting on the separator, e.g. using STRING_SPLIT

answered Oct 09 '19 at 17:19

user1443098

6,487
5
38
67

That won't help. If the field for column 2 has carriage returns in it then the field will span across multiple lines. How would you then know whats a valid new row and what is merely a continuation of the previous row? – LinkOps Oct 09 '19 at 17:35
So, how do you know when a row ends? – user1443098 Oct 09 '19 at 18:22
@user1443096 in a flat file rows end at the end of a line. New lines are typically considered to be a new row of data. – LinkOps Oct 11 '19 at 14:49
So, what character signals a new line? From your note yesterday, I guess it isn't a carriage return. – user1443098 Oct 11 '19 at 15:02
From my previous work in data warehousing we used to replace char(10) - Line Feed, and char(13) - carriage return as per the decimal values from this ascii table http://www.asciitable.com/ – LinkOps Oct 11 '19 at 15:05
Sure, but previously you said "If the field for column 2 has carriage returns in it then the field will span across multiple lines." So are you only looking for line feed (char(10)) to detect new line? – user1443098 Oct 11 '19 at 17:48

How to remove random breaks from flat files being imported (SSIS)

1 Answers1