awk or sed to remove all text after x occurence in each line of file

Question

So I already use sed to clean up certain key words and I use awk to delete everything after a ?, for example. But I have a file that looks like this.

Input:

/value1/value2/value3/morestuff

Desired output:

/value1/value2/value3

all values are not static, I can only key on the slashes.

I need to remove everything after value3. Nothing is static except the number of slashes. Ideas?

Example of code:

cat $FILE | awk '/User/ {print $7,$9,$13}' | awk  -F? '{print $1}' | sort --unique > $tempNAME
sed -i 's/with/ /g' $tempNAME
sed -i 's/trans.*se]//' $tempNAME
sed -i 's/trans.*st]//' $tempNAME

EDIT: clarified input/output

cat, 2 awk, sort and now you want to use sed, I think you can do all the job with awk but you don't show $FILE — ctac_, Jun 08 '18 at 18:10

RavinderSingh13 · Accepted Answer · 2018-06-08T18:42:32.830

2

EDIT: As per OP's comment editing my code now as follows.

echo "/value1/value2/value3/value4/something/whatever" | awk -F"/" '{NF=4} 1' OFS="/"

Since you have not shown samples of input and output so based on your statement following simple awk may help you here.

awk '{sub(/value3.*/,"value3")} 1' Input_file

edited Jun 08 '18 at 18:42

answered Jun 08 '18 at 17:19

RavinderSingh13

130,504
14
57
93

This appears to require the "/value" portion to be static. As I mentioned in OP, nothing is static except slashes. The values inside the slashes can be anything. I showed the sample input: /anything1/anything2/anything3/everythingelse To output: -> /anything1/anything2/anything3 – czah Jun 08 '18 at 18:32
@czah, could you please be more clear then how could I recognize that from where I need to remove the string in line? – RavinderSingh13 Jun 08 '18 at 18:34
"INPUT: /value1/value2/value3/morestuff Desired Output: /value1/value2/value3 all values are not static, I can only key on the slashes." - So, I need a way to count the slashes and remove the rest of the string after, I.E. the 4th slash in a line. – czah Jun 08 '18 at 18:36
@czah, ok if value3 string is not static then after specific number of slashes do I need to remove it? – RavinderSingh13 Jun 08 '18 at 18:37
The string will look like this: "/value1/value2/value3/value4/something/whatever" - I want to then turn it into this: "/value1/value2/value3" - So we remove everything after the 4th slash. It does not matter if we also remove the 4th slash or keep the 4th slash...either one is acceptable. – czah Jun 08 '18 at 18:40
@czah, also try to encourage you to do up-vote to posts for people who try to help you and try to choose an answer as correct answer too. – RavinderSingh13 Jun 08 '18 at 18:48
I think this works. I just need to figure out how to incorporate it into my script. cat $FILE | awk '/User/ {print $7,$9,$13}' | awk -F? '{print $1}' | sort --unique > $tempNAME sed -i 's/with/ /g' $tempNAME sed -i 's/trans.*se]//' $tempNAME sed -i 's/trans.*st]//' $tempNAME cat $tempNAME | awk -F"/" '{NF=4} 1' OFS="/" > $tempNAME with this I end up with a blank file..troubleshooting now! – czah Jun 08 '18 at 18:49
@czah, please see my previous comments + we can't fix your code without knowing your sample of input and sample of output in your post too. – RavinderSingh13 Jun 08 '18 at 18:51

lurker · Answer 2 · 2018-06-08T18:05:25.677

I'm assuming that what you're asking for is to replace each line, which looks like /value1/value2/value3/anything, with /value1/value2/value3 where value1, value2, and value3 are all independent, arbitrary strings that do not include slash.

Since the number of slashes is static, then sed is adequate:

sed "s:^/([^/]*)/([^/]*)/([^/]*)/.*$:/\1/\2/\3:" my_input_file

This starts at the beginning of the line (^). It then matches a slash (/) followed by capturing ((...)) any string that does not include a slash ([^/]*). It does that last bit 3 times. It then matches a slash and any characters remaining (/.*) up to the end of line $. It replaces all of that with the captured matches (\1, \2, and \3) separated by slashes (/\1/\2/\3).

I used a colon (:) as the search/replace separator instead of a slash to avoid having to escape the slashes in the match/replace strings. sed uses the first character after the s command as the separator (see How to replace strings containing slashes with sed).

score 1 · Answer 3 · answered Jun 08 '18 at 18:03

With bash, we can split the string on slash, and then join the first 4 elements with slash:

$ str=/a/b/c/d/e/f/g/h
$ IFS=/ read -ra dirs <<<"$str"
$ (IFS=/; echo "${dirs[*]:0:4}")
/a/b/c

We use "4" because the 0th element of the array is the empty string before the leading slash.

score 0 · Answer 4 · answered Jun 08 '18 at 20:13

This might work for you (GNU sed):

sed 's|/[^/]*||4g' file

Remove the fourth or more occurrences of a / followed by zero or non /'s.

An alternative:

sed 's/\(\(\/[^\/]*\)\{3\}\).*/\1/' file

Remove the remainder of the line after the third occurrence of a / followed by zero or non /'s.

Also with out backslashes:

sed -r 's#((/[^/]*){3}).*#\1#' file

awk or sed to remove all text after x occurence in each line of file

4 Answers4