Removing chars after second space

Question

Assume that I have this text:

eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg] : 20.4453125 eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+Hm[A1sg] : 21.7978515625

I want to remove everything after the second space. Output should be:

eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

Jeff Y · Accepted Answer · 2015-12-20T21:01:16.947

4

If you are absolutely certain that the format (as to spacing) will always be exactly as you've shown it in the question, a simpler solution might be appropriate, but I would dig deeper into the semantics of your data to give a more robust solution.

1) If spacing could possibly vary but you definitely want only the first two non-space-containing sequences, use awk '{print $1,$2}'.

2) If the : is significant and guaranteed to be present, I would use that rather than spaces to delimit what you are after: awk -F: '{print $1}'.

3) I would not recommend any sed/regex solution unless there can be more than one sequential space and it is critical to preserve the exact amount of such space.

edited Dec 20 '15 at 21:01

answered Dec 20 '15 at 20:56

Jeff Y

2,437
1
11
18

no result with this try: cat parse.txt | awk -F: '{print $1}' > out1.txt – JayGatsby Dec 20 '15 at 21:01
It's working for me: `$ cat out1.txt`-->`eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]` – Jeff Y Dec 20 '15 at 21:04
I'm using osx. maybe I should try under Linux? – JayGatsby Dec 20 '15 at 21:09

Josh Crozier · Answer 2 · 2015-12-20T21:09:13.043

2

You could use a capturing group to capture everything before the second space:

(.*?\s.*?)\s.*

And then replace everything with the first capturing group match.

Example Here

So (.*?\s.*?)\s.* replaced with \1 would output:

eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

Alternatively, you could also replace . with \S:

(\S*\s\S*)\s.*

Same output.

edited Dec 20 '15 at 21:09

answered Dec 20 '15 at 20:38

Josh Crozier

233,099
56
391
304

@JayGatsby See: http://stackoverflow.com/questions/13043344/search-and-replace-in-bash-using-regular-expressions – Josh Crozier Dec 20 '15 at 20:51
@JayGatsby `echo $string | sed 's/$\S*\s\S*$\s.*/\1/g'` worked for me. – Josh Crozier Dec 20 '15 at 21:11
@JoshCrozier: ``echo $string | sed -r 's/^(\S+\s\S+)\s.*/\1/'``: the ``-r`` switch enable ERE (brackets and plus doesn't need escaping) and there is no reason to use the global ``g`` flag – Giuseppe Ricupero Dec 20 '15 at 21:23

score 2 · Answer 3 · answered Dec 20 '15 at 21:15

You can also use a simple cut to do the job:

~$ echo 'eskitirim ... ' | cut -d' ' -f-2        # or -f1,2
# eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

~$ echo 'eskitirim ... ' | cut -d':' -f1
# eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

Removing chars after second space

3 Answers3