1

I'm new in using regex, hope someone can help me. I'm using the regex below to grep a csv file for string that exactly has one pipe character (i.e. |)

grep "^([^\\|]+\\|){1}[^\\|]+$" myfile.csv

Unfortunately, the above yields no result when used with grep. Any ideas?

A sample csv file content is as below, where I expect the 2nd line to be found.

"foo"|"foo"|"foo"

"bar"|"bar"

Solutions to this question:

grep -E "^([^|]+\|){1}[^|]+$" myfile.csv

and

egrep "^[^|]+\|[^|]+$" myfile.csv
Rebecca Abriam
  • 520
  • 3
  • 13
  • 1
    This is exactly the kind of thing that regexes really shouldn't be used for: they're not good at counting. Your language/framework of choice may very well have a `str.count()` method or function; it certainly has a `str.find()` that would be much more appropriate. – jscs Sep 20 '13 at 02:57
  • 1
    @JoshCaswell I agree that this might be easier if using a language that has something like this, but it's also perfectly fine for regex (and there are certainly applications for regex where there is no host language available as you suggest). As the OP shows, she's using `grep`. – Phrogz Sep 20 '13 at 03:01
  • You may want to specify the `-E` flag to `grep` in order to get full "extended" regex support. – Phrogz Sep 20 '13 at 03:04
  • @Phrogz: It's easy enough to substitute grep for another more appropriate tool. – jscs Sep 20 '13 at 03:12
  • 1
    Thanks for your responses! But this is purely for adhoc thingy (i.e. grep only) to find problematic entries in a csv file. I would have done it differently if I'm using it in my code. :) Btw, thanks a lot @Phrogz for the `-E` tip plus that of @arshajii about escaping `|`. This one works now perfectly! **`grep -E "^([^|]+\|){1}[^|]+$" myfile.csv'** – Rebecca Abriam Sep 20 '13 at 04:05
  • @RebeccaAbriam You should probably post your own solution since it's the only one working, otherwise it might confuse people coming to this thread. – plalx Sep 20 '13 at 04:07
  • @RebeccaAbriam Also perhaps you should change your selected answer since that solution isin't even in the solution list you specified. – plalx Sep 20 '13 at 12:31
  • See also [Count occurrences of character per line field](http://stackoverflow.com/q/8629410) and (on [Unix.SE]) [How to count the number of a specific character in each line?](http://unix.stackexchange.com/q/18736) – jscs Sep 20 '13 at 19:44

5 Answers5

4

You can try:

^[^|]*\|[^|]*$

You don't need to escape | in a character class. Also you presumably want * instead of + here to allow for strings like |abc, xyz|, and just | on its own.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • Thanks for the info about escaping '|'. I also used this pattern too earlier but it returns all lines (both with 1 and 2 '|'). – Rebecca Abriam Sep 20 '13 at 04:00
1

Try the following:

^[^|]+\|[^|]+$

plalx
  • 42,889
  • 6
  • 74
  • 90
  • Thanks, but I forgot to mention that I tried that regex pattern too before the grouped one I used in my question. But this pattern does not return any result. – Rebecca Abriam Sep 20 '13 at 03:59
  • No, `*` instead of `+` because there is no requirement that it not start or end with a `|` – pguardiario Sep 20 '13 at 09:46
  • @pguardiario, Well that was never specified and since the initial OP's regex was using the `+` repetition operator, I assumed that's what he wanted. You should also look at the **Solutions to this question:** part of his post. He is using `+` and you will see that my answer is in the solution list while the selected answer isin't ;) – plalx Sep 20 '13 at 12:27
  • So because his wrong solution uses `+`, you're arguing that the correct solution must use `+`? I'm sorry this is a weak and weird argument. – pguardiario Sep 20 '13 at 13:12
  • @pguardiario, Who are you to decide that it's a wrong solution when **the OP himself edited his question by adding working solutions** to his problem and note that these uses the `+` operator. I initially assumed that the data format was rigid an couldn't contain *empty values* based on the fact that the initial OP's regex was using the `+` operator and you assumed the opposite based on what? I'm sorry, but it seems you are the one making invalid assumptions here. Will you now downvote the other answers using `*`? I hope not. – plalx Sep 20 '13 at 18:02
1

Solution using awk

awk 'gsub(/\|/,"|")==1' file

gsub(/\|/,"|") this counts number of | replaced, if this equal 1, then do default action, print $0

Edit:Another awk:

awk 'split($0,a,"|")==2' file

Count how many parts text is dived into by |, if 2 print.

Jotne
  • 40,548
  • 12
  • 51
  • 55
0

Here are the solutions to my question. Thanks to the comments that led me to solving this.

grep -E "^([^|]+\|){1}[^|]+$" myfile.csv

and

egrep "^[^|]+\|[^|]+$" myfile.csv
Rebecca Abriam
  • 520
  • 3
  • 13
0

Grep and regexes are the wrong tool for this task. Use something that is intended for counting:

# Use a split function with the pipe as delimiter
awk 'split($0, _, "|") == 2 {print}' the_file

# Set awk's field separator to the pipe character
# and check the number of fields on each line
awk -F'|' 'NF == 2 {print}' the_file
jscs
  • 63,694
  • 13
  • 151
  • 195