1

Reading from dev/random gives me many ?s:

$ head -n 1 /dev/random 
???i??0?4?=K?"?0??^Yx??????b"??k?]?JD?Xǀ?ʝֹ??6;?% ???SW#,?z??6?&?V?/7@??H??????Vg?G?Y*Դ?T???[ޘ?fj?|_r?4?-E??"?.?l^u,??'?N?Ƒ???\?v????7S?\ԔJRcɁ??k??Sn?Ԟ?
                                                 ??^?????a?M{?????~??????+???????EC????J̡

This makes me think that the ?s are characters that my terminal did not display correctly. I have my terminal set to display with UTF-8.

Is this an issue with encodings? Or is this expected since random numbers may not always encode valid characters?

My goal is to generate random sequences of Unicode characters easily on the command line. Specifically, each valid Unicode byte sequence should have some non-zero probability of appearing, and no invalid Unicode byte sequences should appear.

Pedro Cattori
  • 2,735
  • 1
  • 25
  • 43

1 Answers1

2

/dev/random will return pure random data, not ASCII/UTF data. /dev/random is a byte stream of data where each byte may be anywhere from 0 - 255. Because of that, the data won't print properly in your terminal.

A command like:

head -c 255 /dev/random | openssl base64

will give you ASCII letters and numbers only (Base64 data). If you're looking for random, valid UTF data you will need to write some program which generates random numbers (possibly by reading from /dev/random) and uses that data to pick random UTF characters.

Something like this answer

Community
  • 1
  • 1
Josh
  • 10,961
  • 11
  • 65
  • 108
  • thanks. I've updated my question to specify my goal. Also, I have my terminal set to UTF-8 encoding, not ASCII. – Pedro Cattori Dec 07 '15 at 05:48
  • 1
    @PedroCattori updated the answer. There's no easy oneliner I can think of for that, but I linked to another Stack Overflow answer with a python script. – Josh Dec 07 '15 at 05:58