-1

I'm not sure if this is possible, but I'm wondering if it'd be able to split a file into multiple files - dependent on the amount of a specified character there is on each line.

Lets use a colon (:) as an example

File.txt contains the following data (example):

Stack:Overflow   
Stack:Overflow:Flow    
Stack:Over:Flow:Com

Entire line containing 1 colon, goes to 1.txt
Entire line containing 2 colons, goes to 2.txt
Entire line containing 3 colons, goes to 3.txt

(And of course) there wouldn't be a limit to the amount of colons, and format may not necessarily always match the exampled pattern.

Sorry if this is a vague question, I'm first time posting on StackOverflow in a long.


Another side question: Inserting a specific character between 2 different regexs.

Data:

Stack@Stack.Stack192.168.0.1

I'm trying to insert a delimiter which will be ":"
Between 2 different regexes.
Regex #1 being: [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}
Regex #2 being: [0-9]{1,4}\.[0-9]{1,4}\.[0-9]{1,4}\.[0-9]{1,4}

So the desired output would be:

Stack@Stack.Stack:192.168.0.1
  • Please clarify what `format may not necessarily always match the exampled pattern.` means as the input you've shown is just a bunch of lines with colons in them so it's hard to imagine what could be different from that other than just lines without colons in them – Ed Morton Jul 09 '22 at 12:39
  • Regarding your edit, you can use sed: `sed -E 's/regex2$/:&/'` or `sub(/regex2$/, ":&")` in awk, but it won't work if the last segment of the domain ends in a number. Also note that you won't match all valid email addresses with that regex. Maybe relevant: https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression – dan Jul 09 '22 at 13:32
  • @SSAS : are you frequently dealing with 4-digit IPv4 addresses ? – RARE Kpop Manifesto Jul 09 '22 at 19:59
  • Regarding `Another side question` - no, ask one question at a time If you have another side question, then simply post a new question. – Ed Morton Jul 10 '22 at 11:55

2 Answers2

2

With GNU AWK this approach will provide your expected outcome:

awk -F":" '{print > ((NF - 1)".txt")}' file.txt

NB. if you have a large number of delimiters (hundreds - thousands) you may also run into trouble for having too many open files (I believe ulimit -n will tell you how many different files you can have open at one time; on my system it's 256)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • 1
    It appears you are not! Thankyou perfect perfect solution! Edit: Yeah works for my situation :) – StackStackAndStack Jul 09 '22 at 12:40
  • I have another question @jared_mamrot - should I post it here to you or make another separate post – StackStackAndStack Jul 09 '22 at 12:47
  • If you edit your question to include the 'extra bit' I can let you know @StackStackAndStack – jared_mamrot Jul 09 '22 at 12:48
  • @EdMorton Do you have a source for the unparenthesized expression thing? Or an example of an awk where it's a problem? I can't see this in the POSIX awk man page. It just says `the expression shall be evaluated to produce a string that is used as a pathname into which to write`. – dan Jul 09 '22 at 12:58
  • Not sure how best to approach your 'updated' problem @StackStackAndStack; you can use GNU `sed` e.g. `echo "Stack@Stack.Stack192.168.0.1" | sed -n 's/\([A-Za-z0-9._%+-]\+@[A-Za-z0-9.-]\+\.[A-Za-z]\{2,6\}\)\([0-9]\{1,4\}\.[0-9]\{1,4\}\.[0-9]\{1,4\}\.[0-9]\{1,4\}\)/\1:\2/p'` gives the expected output (Stack@Stack.Stack:192.168.0.1), but it feels less-than-ideal. Perhaps it's best to post it as another question – jared_mamrot Jul 09 '22 at 13:02
  • That provided solution was once again perfect for me, thank you so much Jarred :) – StackStackAndStack Jul 09 '22 at 13:12
  • @dan fromthe POSX spec https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html, the input one is `The getline operator can form ambiguous constructs when there are unparenthesized binary operators (including concatenate) to the right of the '<' (up to the end of the expression containing the getline). The result of evaluating such a construct is unspecified, and conforming applications shall parenthesize properly all such usages.` – Ed Morton Jul 09 '22 at 13:22
  • I'd have to spend more time than I'm will to reading the spec to find the output equivalent apparently but you can find questions in this forum asking about syntax errors they get due to not having parens, e.g. https://stackoverflow.com/q/52781584/1745001 and it's mentioned in the GNU awk manual in the string concatenation section, https://www.gnu.org/software/gawk/manual/html_node/Concatenation.html#Concatenation – Ed Morton Jul 09 '22 at 13:38
  • @jared_mamrot : there's absolutely no reason why it's only `gawk`-only : every `awk` released in at least the past 10 years can do that. – RARE Kpop Manifesto Jul 09 '22 at 15:10
2

awk can do this quite easily:

awk -F : '{print > NF-1".txt"}' File.txt
  • With : as a field separator, the number of fields (NF) minus one is equal to the number of field separators. Which we can use for the file name.
  • You can replace the colon with any character, except space, which must be written as [ ], or the octal notation: -F '\\40' or {FS="\40"} (thanks @RARE Kpop Manifesto). Otherwise awk handles it specially.
  • Field separator is normally a regular expression, except, if it's a single character it's treated literally (except space).
dan
  • 4,846
  • 6
  • 15
  • That's the same as @jared_mamrot's earlier answer, https://stackoverflow.com/a/72921389/1745001 – Ed Morton Jul 09 '22 at 12:53
  • 1
    @EdMorton I know. We must have been typing at the same time. Also, I think it's worth describing the behaviour of a single character FS, especially if it's a space, as the question was about using any character. – dan Jul 09 '22 at 13:11
  • 1
    @dan : you can also write it as `FS='\\40'` and it'll be treated as just a single `0x20` instead of special processing rules ( the double-backslash is a must - make it a single one `FS='\40'` and that's same as default) – RARE Kpop Manifesto Jul 09 '22 at 15:13