0

I want to copy data from a large csv files to DB table(with SSIS), but I have a problem with column SubmittedDate in csv file the client put mess in it (19.janv.50 or 08-AUG-69 ) so It cannot be copied into db table (SubmittedDate is date) How can I clean up the date ? I saw derived column in SSIS,

is this my best solution ? and how can I configure it ?

BKChedlia
  • 327
  • 2
  • 4
  • 18
  • I'm voting to close this question as off-topic because it is a business-rules question, and not a programming question. – Tab Alleman Jul 06 '16 at 14:02
  • Why it's not? I m looking for help ... there's a solution with (DT_DATE)(SUBSTRING... but I don't know how to do it, so If you don't want to help, let other people help me – BKChedlia Jul 06 '16 at 14:06
  • There is no one programmatic solution for cleaning up an undefined "mess" in a date column. Before anybody can answer your question you need to define the rules for what you want to do with this "mess". – Tab Alleman Jul 06 '16 at 14:10
  • @BIDeveloper you love that link. I would say the question of how to import a date is duplicate. however this format is very different than the one suggested as an answer in the lnk. further that answer requires a lot of unnecessary parts, in this particular case I like script component and DateTime.TryParseExact because you don't have to do any string manipulation which is a huge headache with crazy formats...... – Matt Jul 06 '16 at 17:51

2 Answers2

1

Because the format can differ for each record I do not believe this is a good case for derived column. I would use TSQL either as an update where the original data is loaded to a field called original date and an update is run on a column called SubmittedDate. Or have this run during the insert from staging to 'live' table if you are using this approach.

I would start with analysis using a select statement and use a case to determine the parsing you want to use. Keep going until Submitted (parsed) date is no longer null / blank. Hopefully this can get you started:

Select 
   -- 19.janv.50
   Case When CharIndex('.', OriginalDate) = 3 
         And CharIndex('.', OriginalDate, 4) = 8 Then --Substring code goes here.
   -- 08-AUG-69
   Case When CharIndex('-', OriginalDate) = 3 
         And CharIndex('-', OriginalDate, 4) = 7 Then --Substring code goes here.
Joe C
  • 3,925
  • 2
  • 11
  • 31
0

I agree with Joe that a Derived Column would likely get very complex to do this conversion. An alternative to writing a huge SQL update statement to test some formats and have to do string parsing is to use a script component in the data flow task to transform and add a derived column. See my answer on this post Convert varchar to datetime using VB script of how to set one up (that answer is in vb, I usually do c# but the principle is the same)

Note if you are new to scripting in SSIS their are actually 2 places to script. Script Task and Script Component the component is the one you want and will be visible when you are inside the data flow.

Whether you use sql staging table (or secondary column) and tSQL to do the conversion or scripting your first task is to figure out what standards/formats are represented within your dataset. The second decision is what to do when a value doesn't match one of the known formats.

One key to writing your script would be DateTime.ParseExact() and DateTime.TryParseExact(). You can also use try catch blocks. Basically try converting with one of the formats you identify and stop when you get to the one you want. One way to do this would be to use a second function.

Sticking with VB

Private Function ConvertDateString (ByVal dateString As String)
    IF DateTime.TryParseExact(dateString,"dd-MMM-yy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"dd-MMM-yy", System.Globalization.CultureInfo.InvariantCulture)
    END IF

    IF DateTime.TryParseExact(dateString,"dd.MMM.yy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"dd.MMM.yy", System.Globalization.CultureInfo.InvariantCulture)
    END IF
End Function

    IF DateTime.TryParseExact(dateString,"MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"MM/dd/yyyy", System.Globalization.CultureInfo.InvariantCulture)
    END IF

    IF DateTime.TryParseExact(dateString,"dd/MM/yyyy", System.Globalization.CultureInfo.InvariantCulture)
         return DateTime.ParseExact(dateString,"dd/MM/yyyy", System.Globalization.CultureInfo.InvariantCulture)
    END IF

  ...etc

End Function

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    If Not Row.SampleDate_IsNull Then

        Row.DerivedDate = ConvertDateString(Row.DateStringColumnName)

    End If
End Sub

Note MMM will give you month abbreviations.

Community
  • 1
  • 1
Matt
  • 13,833
  • 2
  • 16
  • 28
  • thankx @Matt for your help, my client said that he use the US date in his csv (12-OCT-10,09-JAN-13, 22-SEP-13) but when I open it with excel (French keyboard) some dates changes : exp 09-JAN-13 becomes 09.janv.13 and the other stay the same, that's how the dates are different – BKChedlia Jul 07 '16 at 06:35
  • 1
    Excel is probably recognizing it as a datetime field and formatting based on your computers Culture/Globalization settings. Open the CSV in something like notepad to see the format of the date because a text editor wont alter the format. So if your client is telling you the truth then you should just need the DateTime.ParseExact(...,"dd-MMM-yy") conversion – Matt Jul 07 '16 at 15:49
  • thanks matt, I changed language in my server and it works – BKChedlia Jul 08 '16 at 07:11