17

I have a huge files with e-mail addresses and I would like to count how many of them are in this file. How can I do that using Windows' command line ?

I have tried this but it just prints the matching lines. (btw : all e-mails are contained in one line)

findstr /c:"@" mail.txt

Luke Girvin
  • 13,221
  • 9
  • 64
  • 84
Patryk
  • 22,602
  • 44
  • 128
  • 244

9 Answers9

21

Using what you have, you could pipe the results through a find. I've seen something like this used from time to time.

findstr /c:"@" mail.txt | find /c /v "GarbageStringDefNotInYourResults"

So you are counting the lines resulting from your findstr command that do not have the garbage string in it. Kind of a hack, but it could work for you. Alternatively, just use the find /c on the string you do care about being there. Lastly, you mentioned one address per line, so in this case the above works, but multiple addresses per line and this breaks.

Adam S
  • 3,065
  • 1
  • 37
  • 45
  • @Patryk, my mistake, I misread that all emails were on each on their own line. Will revise. – Adam S Feb 16 '12 at 14:23
14

Why not simply using this (this determines the number of lines containing (at least) an @ char.):

find /C "@" "mail.txt"

Example output:

---------- MAIL.TXT: 96

To avoid the file name in the output, change it to this:

find /C "@" < "mail.txt"

Example output:

96

To capture the resulting number and store it in a variable, use this (change %N to %%N in a batch file):

set "NUM=0"
for /F %N in ('find /C "@" ^< "mail.txt"') do set "NUM=%N"
echo %NUM%
aschipfl
  • 33,626
  • 12
  • 54
  • 99
4

Using grep for Windows

Very simple solution:

grep -o "@" mail.txt | grep -c .

Remember a dot at end of line!

Here is little bit more understandable way:

grep -o "@" mail.txt | grep -c "@"

First grep selects only "@" strings and put each on new line.

Second grep counts lines (or lines with @).

The grep utility can be easy installed from grep-for Windows page. It is very small and safe text filter. The grep is one of most usefull Unix/Linux commands and I use it in both Linux and Windows daily. The Windows findstr is good, but does not have such features as grep.

Installation of the grep in Windows will be one of the best decision if you like CLI or batch scripts.

Download and Installation

  1. Download latest version from the project page https://sourceforge.net/projects/grep-for-windows/. Direct link to file is https://sourceforge.net/projects/grep-for-windows/files/grep-3.5_win32.zip/download.
  2. Unzip the ZIP archive. A file is inside.
  3. Put the grep.exe file to the C:\Windows directory or another place from the system path list got using command echo %PATH%. That is all.

Test if grep is working:

  • Open command line window (cmd)
  • Run the command grep --help

Uninstallation

Delete the grep.exe file from folder where you have placed it.

DigiBat
  • 264
  • 2
  • 4
  • 2
    'grep' is not recognized as an internal or external command, operable program or batch file. – Zimba Nov 23 '19 at 13:12
  • everytime I go to install grep I think, why is this soooo much work. And what is the point of that wingrep site, does it even have a download? – Gerry Jan 11 '23 at 18:38
  • Thank you for pointing out the outdated web link. I have updated my answer after 7 years. The grep vanished from former second page and former first page contains version, which needs more then one file to be installed. Hopefully you'll agree now that unzipping and copying a single file to a folder isn't such a complicated installation procedure. – DigiBat Jan 11 '23 at 20:35
  • For a single file, you can simplify this: `grep -o "@" mail.txt | grep -c .` to just: `grep -c "@" mail.txt` – OvalOlive Mar 17 '23 at 05:43
3

May be it's a little bit late, but the following script worked for me (the source file contained quote characters, this is why I used 'usebackq' parameter). The caret sign(^) acts as escape character in windows batch scripting language.

@setlocal enableextensions enabledelayedexpansion    
SET TOTAL=0
FOR /F "usebackq tokens=*" %%I IN (file.txt) do (
    SET LN=%%I
    FOR %%J IN ("!LN!") do (
        FOR /F %%K IN ('ECHO %%J ^| FIND /I /C "searchPhrase"') DO (
            @SET /A TOTAL=!TOTAL!+%%K
        )
    )
)
ECHO Number of occurences is !TOTAL!
paranoid
  • 119
  • 1
  • 2
1

OK - way late to the table, but... it seems many respondents missed the original spec that all email addresses occur on 1 line. This means unless you introduce a CRLF with each occurrence of the @ symbol, your suggestions to use variants of FINDSTR /c will not help.

Among the Unix tools for DOS is the very powerful SED.exe. Google it. It rocks RegEx. Here's a suggestion:

find "@" datafile.txt | find "@" | sed "s/@/@\n/g" | find /n "@" | SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/">CountChars.bat

Explanation: (assuming the file with the data is named "Datafile.txt") 1) The 1st FIND includes 3 lines of header info, which throws of a line-count approach, so pipe the results to a 2nd (identical) find to strip off unwanted header info.

2) Pipe the above results to SED, which will search for each "@" character and replace it with itself+ "\n" (which is a "new line" aka a CRLF) which gets each "@" on its own line in the output stream...

3) When you pipe the above output from SED into the FIND /n command, you'll be adding line numbers to the beginning of each line. Now, all you have to do is isolate the numeric portion of each line and preface it with "SET /a" to convert each line into a batch statement that (increasingly with each line) sets the variable equal to that line's number.

4) isolate each line's numeric part and preface the isolated number per the above via:
| SED "s/\[\(.*\)\].*/Set \/a NumFound=\1/"

In the above snippet, you're piping the previous commands's output to SED, which uses this syntax "s/WhatToLookFor/WhatToReplaceItWith/", to do these steps:

a) look for a "[" (which must be "escaped" by prefacing it with "\")

b) begin saving (or "tokenizing") what follows, up to the closing "]"

    --> in other words it ignores the brackets but stores the number
    --> the ".*" that follows the bracket wildcards whatever follows the "]"

c) the stuff between the \( and the \) is "tokenized", which means it can be referred-to later, in the "WhatToReplaceItWith" section. The first stuff that's tokenized is referred to via "\1" then second as "\2", etc.

So... we're ignoring the [ and the ] and we're saving the number that lies between the brackets and IGNORING all the wild-carded remainder of each line... thus we're replacing the line with the literal string: Set /a NumFound= + the saved, or "tokenized" number, i.e. ...the first line will read: Set /a NumFound=1 ...& the next line reads: Set /a NumFound=2 etc. etc.

Thus, if you have 1,283 email addresses, your results will have 1,283 lines.

The last one executed = the one that matters.

If you use the ">" character to redirect all of the above output to a batch file, i.e.: > CountChars.bat

...then just call that batch file & you'll have a DOS environment variable named "NumFound" with your answer.

Corb
  • 11
  • 1
1

I found this on the net. See if it works:

findstr /R /N "^.*certainString.*$" file.txt | find /c "@"
gentrobot
  • 673
  • 6
  • 25
  • 1
    Thanks for that but `findstr /R /N "^.*@.*$" mail.txt | find /c "@"` returns 1 for me. Maybe that an issue with results being in one line. – Patryk Feb 16 '12 at 07:53
  • I think this would work but my text is not just one line but a 200KB JSON, so I got `FINDSTR: Line 1 is too long.` – Csaba Toth Jan 18 '21 at 19:43
1

I would install the unix tools on your system (handy in any case :-), then it's really simple - look e.g. here:

Count the number of occurrences of a string using sed?

(Using awk:

awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt

).

You can get the Windows unix tools here:

http://unxutils.sourceforge.net/

Community
  • 1
  • 1
TheEye
  • 9,280
  • 2
  • 42
  • 58
  • Just `grep -c -o @` would suffice (or `grep -o @ | wc -l` if you have a buggy `grep` which doesn't DTRT). – tripleee Feb 16 '12 at 09:04
  • Also the `awk` script counts lines with occurrences, not actual occurrences. It's not hard to fix, but easier still to use `grep -o`. – tripleee Feb 16 '12 at 09:06
  • @tripleee I know I can use unix tools (which is much easier to use) but I would like to to do with Windows' command line. – Patryk Feb 16 '12 at 09:35
  • 2
    More power to you, then, or actually, less. (^: – tripleee Feb 16 '12 at 13:26
0

This is how I do it, using an AND condition with FINDSTR (to count number of errors in a log file):

SET COUNT=0
FOR /F "tokens=4*" %%a IN ('TYPE "soapui.log" ^| FINDSTR.exe /I /R^
 /C:"Assertion" ^| FINDSTR.exe /I /R /C:"has status VALID"') DO (
  :: counts number of lines containing both "Assertion" and "has status VALID"
  SET /A COUNT+=1
)
SET /A PASSNUM=%COUNT%

NOTE: This counts "number of lines containing string match" rather than "number of total occurrences in file".

djangofan
  • 28,471
  • 61
  • 196
  • 289
0

Use this:

type file.txt | find /i "@" /c
Pang
  • 9,564
  • 146
  • 81
  • 122
  • While this code snippet may solve the question, [including an explanation](http://meta.stackexchange.com/questions/114762/explaining-entirely-‌​code-based-answers) really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Rosário Pereira Fernandes Mar 28 '17 at 21:25