0

I've been trying for hours to do the following to a file that I'm converting from CSV to pipe delimited. After it's been converted I want to remove only the pipe between two pipes. I don't know if that is possible.

Example:

Original input

X, Y, This is a | test for me, or 
X, Y, This is a|test for me,

Original output:

| X | Y | This is a | test for me| or 
|X|Y|This is a|test for me|

Desired output:

| X | Y | This is a test for me|

I have tried but I just can't do it, can't find the regexpr or sed - regexp has always been hard for me.

I'm new to C, script. I handled the conversion and also if we get something like Street name, apt number, so we remove the comma between name and apt but keep the one after number which is the one to be converted to pipe.

I do a cat with several sed events to handle other things, do you think is best to do it there and will do it to the 1k plus rows I have? It used an awk for part of the script which I'm also not familiar.

Is my question the best solution or should I handle it before I even convert it to pipe? I think what the script does too is enclose in double quotes cases like "street name, apt #", so that way it can just remove the comma inside the quotes.

No luck with several tries and

cat <input> | sed 's/ | / /g' | tr , '|'

or:

cat <input> | sed 's/ | / /g;s/,/\|/g'

this is the script that does what i describe above for the commas i need to add the pipe handler when it comes as my example because otherwise it divides my string into two

Anyone want to help?

Mr Robot
  • 29
  • 3
  • 1
    how is it related to C? do not spam tags – 0___________ Aug 12 '21 at 08:07
  • As an aside, you want to avoid the [useless use of `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat) – tripleee Aug 12 '21 at 08:23
  • If you are using Awk elsewhere in the pipe, you should probably refactor everything into the Awk script. Awk can do everything `sed` can do (and `cat`, and `cut`, `head`, etc). – tripleee Aug 12 '21 at 08:25
  • @tripleee The script was pass by a senior developer i dont know how to do awk, it does all for the commas but i just can figure it out so i can replicate for pipe, and if i do get it i dont know how to do it – Mr Robot Aug 12 '21 at 08:38
  • Please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) – tripleee Aug 12 '21 at 08:54
  • That script is just too riddled with bad practices and apparent errors. The `wc -l` probably masks the error which the following line was supposed to catch.. But `[ ! $? ]` is never going to be true; you are checking if the variable is empty, which it will never be. The use of temporary files which are not cleaned up is a security problem. Seriously, get somebody who knows a bit of shell script to audit this. Maybe try posting on our sibling site [codereview.se] – tripleee Aug 12 '21 at 08:57
  • remove the unwanted pipes **before** adding your new pipe delimiters – markp-fuso Aug 12 '21 at 17:28

2 Answers2

1

This should do:

echo "X, Y, This is a | test for me" | sed  's/ |//;s/, /|/g'
X|Y|This is a test for me
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • what if the test is "X, Y, This is a|test for me" like that no spaces between a|test. it works if the pipe has spaces before and after but not if no spaces. Would yo u mind explaining the regexpre? I would like to also know fully understand it – Mr Robot Aug 12 '21 at 08:18
  • 1
    @MrRobot Add a question mark after the space to make it optional `sed 's/ ?|//;s/, /|/g'` – Jotne Aug 12 '21 at 08:22
  • Then please edit your question to provide full requirements, or better yet accept this answer and ask a new question with your _actual_ requirements. See also the guidance for providing a [mre]. – tripleee Aug 12 '21 at 08:22
  • @Jotne I appreciate it. It does what i want, but i guess doing it for 1k records it does just by doing the sed i guess i need to figure out that now. Do you think you can explain to me on your answer that each part of the expression so i can understand it. I get some parts not all – Mr Robot Aug 12 '21 at 08:36
  • This last one still fails `sed 's/ ?|//;s/, /|/g'` when i do either it keeps the | --- `X, Y, This is a | test for me, or X, Y, This is a|test for me,` – Mr Robot Aug 12 '21 at 08:50
0

Based solely on the limited set of input data, some assumptions:

  • ignore the trailing or for the first line of the sample input since the or does not show up in the expected output, otherwise OP needs to provide details on the logic for when to remove trailing strings
  • input data does not contain any commas (,), ie, all commas are delimiters
  • output lines have a space separating each field from the | delimiter, which means the last field should have a trailing space before the final |, just like the 1st/2nd fields show a trailing space (in the expected output)
  • all input/output lines end with a delimiter (, or |)
  • all output lines begin with a | delimiter
  • all white space are actual spaces, ie, do not need to deal with tabs
  • NOTE: assuming question is updated with more details then some of these assumptions can be removed and the proposed code updated accordingly ...

Sample input data:

$ cat raw.csv
X, Y, This is a | test for me,
X, Y, This is a|test |  for me  |  ,

One sed idea:

sed -E 's/[ ]*\|[ ]*/ /g; s/^[ ]*/\| /g; s/[ ]*,[ ]*$/ \|/g; s/[ ]*,[ ]*/ | /g' raw.csv

Where:

  • 1st sub replaces variable # of spaces + | + variable number of spaces with a single space [remove unwanted | before adding | delimiters]
  • 2nd sub replaces start of line + variable number of spaces with | (single trailing space)
  • 3rd sub replaces variable number of spaces + , + variable number of spaces + end of line with a space | (single leading space)
  • 4th sub replaces variable number of spaces + , + variable number of spaces with | (single leading/trailing spaces)

This generates:

| X | Y | This is a test for me |
| X | Y | This is a test for me |
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • OK so, I have CSV file with 50k records, which encloses actual commas inside double quotes to avoid being handle as a delimiter and removes them. Then the actual commas being delimiters get converted to `|`. The issue is that in some of the strings i see something like `X, Y, this a|test for me,` so when i convert to `|` it will take that `test for me` part as a new record. I need to handle in my file all this cases. SO like someone mentioned, removed first all `|` from the file and then convert it to pipe delimited. – Mr Robot Aug 12 '21 at 18:18
  • @MrRobot I'm not sure I understand your comment ... does the proposed answer work or not? if 'not' then please update the question with more details (to refute my assumptions) and/or provide more sample inputs for which this answer does not work – markp-fuso Aug 12 '21 at 18:22