0

I'm trying to use sed command to clean a txt file:

sed -i.bak -e 's@^[A-Za-z0-9_.;,:]+$@@g' *.txt

returns

sed: RE error: illegal byte sequence

What am I doing wrong with regular exp? Normally I'm saying "replace all that isn't A-Za-z0-9_.;,:" with ""

Michele
  • 681
  • 3
  • 16
  • 27
  • You might want to use some posix(?) character class names for clarity: `sed -i.bak -e 's/[^[:alnum:][:punct:][:blank:]]//g' *.txt` -- you're aware that you're also removing spaces and tabs? – glenn jackman Dec 11 '14 at 00:53
  • 1
    possible duplicate of [RE error: illegal byte sequence on Mac OS X](http://stackoverflow.com/questions/19242275/re-error-illegal-byte-sequence-on-mac-os-x) – glenn jackman Dec 11 '14 at 00:54
  • Oh, yeah, how do I keep spaces? – Michele Dec 11 '14 at 01:19
  • PS, you do not need the `-e` since you only have on command group in your `sed` – Jotne Dec 11 '14 at 06:50
  • @Michele, like I show in my answer: remove chars that are not alphanumeric, punctuation or blanks. – glenn jackman Dec 11 '14 at 11:27

3 Answers3

1

You put the ^ @ a bad place, put it there :

sed -i.bak -e 's@[^A-Za-z0-9_\.;,:]\+$@@g' *.txt

And not the little changes (backslashing some special chars)

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
0

Let say you have something like this in a file named "my_file"

Location: http://www.google.gy/?gws_rd=cr&ei=l_KIVOXnIsinNq2NgsgB [following]
--2014-12-10 21:25:44--  http://www.google.gy/?gws_rd=cr&ei=l_KIVOXnIsinNq2NgsgB
Resolving www.google.gy (www.google.gy)... 64.233.176.94, 2607:f8b0:4002:c05::5e
Connecting to www.google.gy (www.google.gy)|64.233.176.94|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.2'

You can try

sed -i.bak -e 's#[^[:alnum:].;,:]##g'  'my_file'

This will find characters that are not alphanumeric or "."or ";"or ","or ":" and print. Results

Location:http:www.google.gygwsrdcreilKIVOXnIsinNq2NgsgBfollowing
2014121021:25:44http:www.google.gygwsrdcreilKIVOXnIsinNq2NgsgB
Resolvingwww.google.gywww.google.gy...64.233.176.94,2607:f8b0:4002:c05::5e
Connectingtowww.google.gywww.google.gy64.233.176.94:80...connected.
HTTPrequestsent,awaitingresponse...200OK
Length:unspecifiedtexthtml
Savingto:index.html.2
repzero
  • 8,254
  • 2
  • 18
  • 40
0

Glenn Jackman was right, the solution found in the other post helped...

the only problem is the command now only knows english latin characters so won't work...

Here's the result, nothing changed:

ÁÉc†ÿ°“Å9,0,sub,,0,0,0,,Pero, aun no comprendo porque quer√≠a acabar conÄC∂u⁄ÁÉx¨†ú°ñÅ996,0,sub,,0,0,0,,õÇ–†µ°ØÅ*10,0,sub,,0,0,0,,Ha deshonrado aléC∂u⁄ÁÉ©≤†”°ÕÅ11,0,sub,,0,0,0,,{\pos(1481.142,795.974)\bord0\fad(800,0)}Himalayan RangeõÇ!¸C∂u@óÁÉf†”°ÕÅ12,0,sub,,0,0,0,,¬øEsta seguro que querer hacerlo solo?, se√±or MitsumazaõÇ»†ª°µÅî13,0,sub,,0,0,0,,Silencio Tatsumi, tranquil√≠zateõÇ2C∂u@ôÁÉ,†©°£Å14,0,sub,,0,0,0,,Pero se√±or...õÇ\†≠°ßÅ\15,0,sub,,0,0,0,,Aunque lo digas...õÇ<†∏°≤Åò16,0,sub,,0,0,0,,Tengo un esp√≠ritu aventureroõÇ|C∂u@£ÁÉ@†∞°™Å17,0,sub,,0,0,0,,Lo entiendo se√±or...õÇ–†≤°¨Å–18,0,sub,,0
Michele
  • 681
  • 3
  • 16
  • 27