0

I have a file, file1.txt, like this:

This is some text.
This is some more text. ② This is a note.
This is yet some more text.

I need to delete any text appearing after "②", including the "②" and any single space appearing immediately before, if such a space is present. E.g., the above file would become file2.txt:

This is some text.
This is some more text.
This is yet some more text.

How can I delete the "②", anything coming after, and any preceding single space?

Community
  • 1
  • 1
Village
  • 22,513
  • 46
  • 122
  • 163

4 Answers4

3

A Perl solution:

$ perl -CS -i~ -p -E's/ ②.*//' file1.txt

You'll end up with the correct data in file1.txt and a backup of the original file in file1.txt~.

Dave Cross
  • 68,119
  • 3
  • 51
  • 97
2

I hope you do realize most unix utilities do not work with unicode. I assume your input is in UTF-8, if not you have to adjust accordingly.

#!/bin/bash
function px {
 local a="$@"
 local i=0
 while [ $i -lt ${#a}  ]
  do
   printf \\x${a:$i:2}
   i=$(($i+2))
  done
}
(iconv -f UTF8 -t UTF16 | od -x |  cut -b 9- | xargs -n 1) |
if read utf16header
then
 echo -e $utf16header
 out=''
 while read line
  do
   if [ "$line" == "000a" ]
    then
     out="$out $line"
     echo -e $out
     out=''
   else
    out="$out $line"
   fi
  done
 if [ "$out" != '' ] ; then
   echo -e $out
 fi
fi |
 (perl -pe 's/( 0020)* 2461 .*$/ 000a/;s/ *//g') |
 while read line
  do
    px $line
  done | (iconv -f UTF16 -t UTF8 )
pizza
  • 7,296
  • 1
  • 25
  • 22
1

sed -e "s/[[:space:]]②[^\.]*\.//"

However, I am not sure that the ② symbol is parsed correctly. Maybe you have to use UTF8 codes or something like.

Matthias
  • 8,018
  • 2
  • 27
  • 53
1

Try this:

sed -e '/②/ s/[ ]*②.*$//'
  • /②/ look only for the lines containing the magic symbol;
  • [ ]* for any number (matches none) of spaces before the magic symbol;
  • .*$ everything else till the end of line.
vyegorov
  • 21,787
  • 7
  • 59
  • 73