0

I have a bunch of files that contains thousands of records.The structure of each file is same.

Each record is on a separate line and has multiple fields separated by a delimiter '|'.

Each row should have 36 fields, but the problem is some of these rows has <>35 fields, i.e. <>35 '|' characters.

Can someone please suggest a way in windows, by which I can identify the row. (Like record with delimiters <>35 should be written to bad file).

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • When you say <>35 I am not sure I am interpreting you correctly. 35 fields that are basically `|||||||...|` or more or less than the expected number of what I assume is 35 delimiters? – Compass Sep 19 '14 at 14:43
  • Does the solution have to efficient/quick? – foxidrive Sep 19 '14 at 15:13
  • What character set is the data between the `|` characters? Is it only alphanumeric? – foxidrive Sep 19 '14 at 15:42
  • the fields can be numeric or alphanumeric. the solution need not be efficient/quick. – Novice Smaw Sep 19 '14 at 17:12
  • there will be 36 fields, each field is separated by delimiter | the last field does not have a delimiter so i would assume each line should have 35 delims.if the delims if less than 35 or more than 35 then it is bad data so i need to report that data(bad) – Novice Smaw Sep 19 '14 at 17:15

3 Answers3

1
@ECHO Off
SETLOCAL
:: Looking for exactly 36 fields - no empty fields
FOR /f "delims=" %%a IN (q25936461.txt) DO (
 SET good=Y
 FOR /f "tokens=1,30*delims=|" %%m IN ("%%a") DO (
  IF "%%o" equ "" (SET "good=") ELSE (
   FOR /f "tokens=1,6,7delims=|" %%p IN ("%%o") DO (
    IF "%%r" neq "" SET "good="
    IF "%%q" equ "" SET "good="
   )
  )
 )
 IF NOT DEFINED good ECHO(%%a
)
ECHO ========== method 1 done =============
:: Looking for exactly 36 fields - allow empty fields
FOR /f "delims=" %%a IN (q25936461.txt) DO (
 SET good=Y
 SET "line=%%a"
 SET /a count=0
 CALL :analyse
 IF NOT DEFINED good ECHO %%a
)
ECHO ========== method 2 done =============

GOTO :EOF
:analyse
SET "linem=%line:*|=%"
IF "%linem%" neq "%line%" SET /a count+=1&SET "line=%linem%"&GOTO analyse
IF %count% neq 35 SET "good="
GOTO :eof

Here's two methods. Testing is your problem....

Magoo
  • 77,302
  • 8
  • 62
  • 84
  • It works like a charm/perfectly if the last field contains a not null value.if the last field is null it gives wrong results.I still accept it. – Novice Smaw Sep 27 '14 at 08:32
0

On

cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00

this command

findstr /r /i /n /v "^.*|.*|.*|.*$" "C:\Users\User\Desktop\test.txt"

shows

6:cat|26/7/14|$15.00

Type findstr /? for more.

Noodles
  • 1,981
  • 1
  • 11
  • 4
0

It could be done with FINDSTR alone if the number of columns on valid lines was <=15.

For example, the following would show all lines that do not have exactly 3 columns:

findstr /vx "[^|]*|[^|]*|[^|]*" test.txt

But FINDSTR cannot handle more than 15 character class terms. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info. Your search would require 35 such terms.

The following solution returns all the faulty lines, except it ignores empty lines. It relies on REPL.BAT - a hybrid JScript/batch utility that performs a regex search/replace on stdin and writes the result to stdout. REPL.BAT is pure script that will run on any modern Windows machine from XP onward.

The solution uses REPL.BAT to remove all characters from lines that have exactly 36 columns, and then uses FINDSTR to print remaining lines that have at least one character.

<test.txt repl "^([^|]*\|){35}[^|]*$" ""|findstr .
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390