0
$ grep "^底线$" query_20220922  | wc -l
95701
$ grep -iF "底线" query_20220922  | wc -l
796591

Shouldn't the count be exactly the same? I want to count the exact match of the string.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
marlon
  • 6,029
  • 8
  • 42
  • 76
  • first of all, you can use `-c` to count number of matching lines (you'll need `-o` and piping to `wc -l` if you can have more than one match per line)... second, can you explain your reasons as to why you expect these two commands to give same result? – Sundeep Sep 24 '22 at 05:42
  • 1
    This might help: [The Stack Overflow Regular Expressions FAQ](https://stackoverflow.com/a/22944075/3776858) – Cyrus Sep 24 '22 at 10:21

2 Answers2

0

-F matches a fixed string anywhere in a line. ^xyz$ matches lines which contain "xyz" exactly (nothing else).

You are looking for -x/--line-regexp and not -F/--fixed-strings.

To match lines which contain your search text exactly, without anything else and without interpreting your search text as regular expression, combine the two flags: grep -xF 'findme' file.txt.

Also, case-insensitive matching (-i) can match more lines too than case-sensitive matching (the default).

knittl
  • 246,190
  • 53
  • 318
  • 364
0

No, they do different things. The first uses a regular expression to search for "底线" alone on an input line (^ in a regular expression means beginning of line, and $ means end of line).

The second searches for the string anywhere on an input line. The -i flag does nothing at all here (it selects case-insensitive matching, but this is not well-defined for CJK character sets, so basically a no-op) and -F says to search literally (which makes the search faster for internal reasons, but doesn't change the semantics of a search string which doesn't contain any regex metacharacters).

It should be easy to see how they differ. For a large input file, it might be a bit challenging to find the differences if they are not conveniently mixed; but for a quick start, try

diff -u <(grep -m5 "^底线$" query_20220922) <(grep -m5Fi "底线" query_20220922)

where -m5 picks out the first five matches. (Try a different range, perhaps with tail, if the differences are all near the end of the file, for example.)

Tangentially, you usually want to replace the pipe to wc -l with grep -c; also,you might want to try grep -Fx "底线" as a faster alternative to the first search.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • So "grep -Fx" will do exact string match, "grep -x" will do exact regular expression match, and "grep -Fxic" will count the case-insensive exact string matches. Right? – marlon Sep 24 '22 at 18:11
  • Also, I tried "grep -xFi nba hot_query_20220922 | wc -l" and "grep -xFic nba query_20220922, but the count numbers are different. Why is that? – marlon Sep 24 '22 at 18:16
  • Looks like the file names are different. – tripleee Sep 24 '22 at 19:14