I have a 17GB pipe-delimited .txt file, and need to replace any strings that are more than 10 characters, between the 32nd and 33rd pipe, to their first 10 characters in order to populate a database column without opening the file in sublime-text; so it would need to be done through Java or AIX-BASH. On regex101.com
I was trying to implement the ideas presented in the following post:
but it doesn't limit the matched pattern only to my replacement-string.
Sample input:
|12210|IA||15||i956-743||||||l.4073||||a5015b3ed||l.464939|IC|||06 06:18:17||wireered||ENTITY|wirvered|2||||NoPodfoundorpoddoesnothaveedgetob-rd=l.415.63Z|REY||||RY|REY||
Intended output:
Change ...|NoPodfundddorpoddoesnot...|...
to ...|NoPodfundd|...
Full output string after replacement/truncation:
|12210|IA||15||i956-743||||||l.4073||||a5015b3ed||l.464939|IC|||06 06:18:17||wireered||ENTITY|wirvered|2||||NoPodfundd|REY||||RY|REY||
Attempt at regex match:
^(?:[^|]*\|){32}[^|]+\|
which matches everything from the start to the 33rd |
, so |12210.......l.415.63Z|
, but I want it to only match the string between pipes 32 and 33, specifically NoPodfoundorpoddoesnothaveedgetob-rd=l.415.63Z
, for replacement purposes.
update 1; 10/18/17:
(^(?:[^|]*\|){32}[^|]{0,10})([^|]*)(\|.*$)
group capture substitution with \1\3
provides the desired result. But this match must have a flaw since it seems to be capturing a non-capturing group (?:[^|]*\|)
.
update 2; 10/19/17:
Tried the following commands in PUTTY command line, but it does not edit the file:
cat subStrTest.txt
awk 'BEGIN{FS=OFS="|"}{$33=substr($33,1,10)} 1' subStrTest.txt
https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html suggests that
string = substr(string,startIndex,numOfCharacters)
is valid syntax, at least for gawk
, but I don't know whether the assignment
$33=substr($33,1,10)
is valid for strings referenced with $
, as in $33
within awk