0

I'm looking to remove any non-alphabetic (English) characters and make the output lower-case from /usr/share/dict/words. Here's what I have so far:

sed "$(shuf -i "1-$(cat /usr/share/dict/words | wc -l)" -n 1)q;d" /usr/share/dict/words | tr '[:upper:]' '[:lower:]' | sed 's/[^-a-z]//g'

This works fine but is it possible to do it all in the one sed command?


EDIT: The American word file looks like this:

A
A's
AMD
AMD's
AOL
AOL's
AWS
AWS's
Aachen
Aachen's

I'm looking to make this lower-case and remove any non-alphabetic characters (as mentioned in my original question). The solution I have works fine but I'm hoping to reduce the number of commands (maybe just sed?). Output of the above would then be:

a
as
amd
amds
aol
aols
aws
awss
aachen
aachens
Neil C. Obremski
  • 18,696
  • 24
  • 83
  • 112
  • `sed` can do `tr` but it can't easily be made to implement `shuf` or `wc`, so, no, unlikely you can do it all in the one `sed` command – jhnc May 12 '21 at 17:33
  • `I'm hoping to reduce the number of commands` what for? Do a function - it will one command then. `do it all in the one sed command?` sed is turning complete, but any realistic sed script that would solve this will be hundreds of pages long, mostly because of missing arithmetic in sed. – KamilCuk May 12 '21 at 17:34
  • I know sed can't do `shuf` so I should have been more specific. I am piping sed output into sed and tr so I know there's some optimization that could be done with that but I'm not sed-savvy enough (yet) to know that. I suppose I'll just figure it out myself and post it when I do – Neil C. Obremski May 12 '21 at 17:39
  • `tr` should be way faster then `sed`, there's nothing to optimize. https://stackoverflow.com/questions/4569825/sed-one-liner-to-convert-all-uppercase-to-lowercase - does this answer your question? – KamilCuk May 12 '21 at 17:40
  • `shuf -n 1 – jhnc May 12 '21 at 17:53

3 Answers3

2

You don't need sed and wc -- shuf can shuffle the lines of a file.
tr can remove non-alphas, so again don't need sed

shuf -n1 /usr/share/dict/words | tr -dc '[:alpha:]' | tr '[:upper:]' '[:lower:]'
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

This single awk command should do the job:

awk '{gsub(/[^[:alpha:]]+/, ""); print tolower($0)}' file

a
as
amd
amds
aol
aols
aws
awss
aachen
aachens
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

This might work for you (GNU sed and shuf):

shuf -n1 /usr/share/dict/words | sed 's/[^[:alpha:]-]//g;s/.*/\L&/'

Choose a random line, remove any non-alpha (except hyphen) characters and lowercase the result.

potong
  • 55,640
  • 6
  • 51
  • 83
  • This is pretty good. Something I tried (and failed) to figure out is how to do all of that `sed` stuff to a specific line from a larger file. I got around the issue by piping it in like you're doing here but I wondered, if I know line 17 is what I want then why can't I write `17q;s/...blahblah.../` ... it just didn't do what I expected – Neil C. Obremski May 13 '21 at 16:14
  • @NeilC.Obremski it is the other way round but also using the `-n` option and `p` flag on the substitution command e.g. `sed -n '17{s/[^[:alpha:]-]//g;s/.*/\L&/p;q}' file` – potong May 13 '21 at 22:30