0

I have massive text files which contain ASCII character 1 (SOH) as column delimiter and ASCII character 2 (STX) as row delimiter. Have been looking at how to find and replace these characters but having a lot of trouble seeing as I can't even find out how to refer to them. I think I should be using a utility called sed but have read all of the sed man page and not any the wiser with regard to special characters.

I want to replace the SOH with a | and the STX with a carriage return and line feed, anyone know how to do this?

billinkc
  • 59,250
  • 9
  • 102
  • 159
user3775501
  • 188
  • 1
  • 2
  • 15
  • Convert the ASCII into an array of bytes (characters) and you'll have an easier time with this. – perry Oct 24 '14 at 14:47
  • Sed is for unix. Are you on Unix? – paparazzo Oct 24 '14 at 15:08
  • In C#: s.Replace("\u0001", "|"); s.Replace("\u0002", "\r\n"); – perry Oct 24 '14 at 15:10
  • How do you feel about leaving the special characters in the file and just importing it with those delimiters? – billinkc Oct 24 '14 at 16:09
  • @billinkc I am pretty sure OP want to parse the data. – paparazzo Oct 24 '14 at 16:11
  • @blam They've tagged the question with SSIS and they are changing delimiters to something that corresponds to an out of the box selection for the tool. If the problem they are trying to solve is "how can I use SSIS to import a file with weird column and field delimiters" then we have an existing answer. – billinkc Oct 24 '14 at 16:16
  • I expect we'll be able to close this as a duplicate of [How to read a flatfile with lowercase thorn as the delimiter](http://stackoverflow.com/questions/20388031/how-to-read-a-flatfile-with-lowercase-thorn-as-the-delimiter/20390353#20390353). – billinkc Oct 24 '14 at 16:33
  • 1
    I performed the same steps in the above duplicate candidate, except used _x0001_ for the column delimiter and _x0002_ as the row delimiters. I was able to import a data just fine. Making this a separate comment so when the dupe logic runs, the relevant bits are preserved. – billinkc Oct 24 '14 at 16:34
  • @billinkc, this link is a brilliant solution. But as I have a number of imports to do I'll use the solution provided by perry - create a Windows app using C# to convert all the files using the key line he provided. BTW for anyone facing the same issue, I had also tried using the utility at http://sourceforge.net/projects/findandreplace/ which worked for normal files but crashed for anything over half a gigabyte. – user3775501 Oct 25 '14 at 10:20

1 Answers1

0

You can do it using tr like this:

cat <your_file> | tr '\001' ',' | tr '\002' '\n'

If you want to change several files, you can use the find command to list your files with the -exec flag with the line above.

arutaku
  • 5,937
  • 1
  • 24
  • 38