3

I've written a little script that will generate a random URL, it works, however I would like the generated URL to be somewhat believable, meaning I want it to generate real words. As of now it generates 7 random characters and digits.

def generate_url(length=7)
  protocall = %w(https:// http://)
  body = rand(36**length).to_s(36)
  num = rand(1..999).to_s
  url = "#{protocall.sample.to_s}#{body}.com/php?id=#{num}"
  puts url
end

#<= http://s857yi5.com/php?id=168
#<= https://6rm0oq3.com/php?id=106
#<= http://skhvk1n.com/php?id=306

So what I'm looking for is an easier way to do this with real words in place of the random 7 character string (please keep it between 7 and 10 characters) without using an external gem

My OS is Windows 7

13aal
  • 1,634
  • 1
  • 21
  • 47
  • 1
    You can provide a list of shorts word and combine several of them into a desired string. – hrust May 05 '16 at 12:34
  • @Gemma Yeah I could do that, but that would defeat the purpose of this being a random generator. If I have a list, there's only so many options, if it's random it will never be the same. – 13aal May 05 '16 at 12:35
  • 2
    You cannot programmatically generate "real" words without a list of real words, use a list and pick some randomly – Alex K. May 05 '16 at 12:40
  • @AlexK. Why can't you? – 13aal May 05 '16 at 12:41
  • 1
    Well, what algorithm produces "KITTEN" ? – Alex K. May 05 '16 at 12:42
  • @AlexK. So you're telling me that there's thousands of different ways to produce random digits and random characters, and absolutely no way to produce a random word? – 13aal May 05 '16 at 12:43
  • I suggest to have a look into the implementation of the `Faker` gem that does exactly what you need: generate random URLs. Find the implementation here: [`Faker::Internet.url`](https://github.com/stympy/faker/blob/master/lib/faker/internet.rb#L138). – spickermann May 05 '16 at 12:45
  • 1
    `cat` is a valid word, whereas `ykw` is not. How should an algorithm know what are valid words and what words aren't valid without the use of an dictionary. When you still have to use a dictionary, then just pink a random word.... – spickermann May 05 '16 at 12:48
  • 3
    A word (when read by a human) is not a random collection of letters, the order of the letters determines its meaning, you cannot compute this order - its not based on any logical set of rules (ask a linguist!) – Alex K. May 05 '16 at 12:48
  • 1
    @AlexK. That's very true, alright, so I need a way to use a dictionary, cool thank you. – 13aal May 05 '16 at 13:01
  • 1
    @13aal: The dictionary is easiest. You can however, generate things that look closer to real English words than random character strings (and may often be valid words), using approaches such as n-grams. If you like random generators, this is worth exploring, and might lead to a different question. But as your question is written, if you always want *real* English words, by far simplest to sample from a pre-defined word list. – Neil Slater May 05 '16 at 13:04
  • @NeilSlater Could you elaborate a little more? This sounds pretty interesting and might just be what I'm after. I was just looking to see if Mozilla had an API I could connect to and run through their list of words. – 13aal May 05 '16 at 13:20
  • 1
    @13aal: For an n-gram model, you would still at some point needed to have processed a list of words. That list would be converted into some data (for tri-grams this might have say 5,000 entries) and a generator that worked from that data. It would make your task harder. I cannot find any character-level n-gram generators in Ruby on a quick search. Typically things like fantasy name generators use them. – Neil Slater May 05 '16 at 13:45
  • Well looks like I'll have to make one then, thank you. – 13aal May 05 '16 at 13:46
  • 1
    Here's a blog about something related in Python: http://www.war-worlds.com/blog/2012/07/generating-names – Neil Slater May 05 '16 at 14:06
  • @NeilSlater Thank you, you've helped a lot. – 13aal May 05 '16 at 14:40

5 Answers5

5

Disclaimer: This answer is targetted for developers seeking solution to this problem in unix systems. However this does not address this problem for non-unix systems.


You can use ruby's system calls to do this. Unix system in-built have commands to grab random lines from files.

Good news, unix systems also have the whole english dictionary at usr/share/dict/words. So, in ruby I would do

`shuf -n 1 /usr/share/dict/words`.chomp
=> "dastardly"

Note: Here I have used backtick as system call. and shuf command get you random line from a file.

So URL would be

random_word = `shuf -n 1 /usr/share/dict/words`.chomp
url = "#{random_word}#{body}.com/php?id=#{num}"
=> "wrongfullythisis_body_part.com/php?id=123"
Shiva
  • 11,485
  • 2
  • 67
  • 84
  • he's working on Windows7 Im not sure that he will be able to use those commands/files, he has to copy the dictionary and do something with ruby/powershell command. – Horacio May 05 '16 at 14:42
4

Try faker is a cool gem to generate words, emails, urls, or whatever you need

https://github.com/stympy/faker

I've used on many proyect.

hb@hora ~ » irb
2.2.3 :001 > require 'faker'
 => true 
2.2.3 :002 > Faker::Lorem.sentence(3)
 => "Ea esse ex." 
2.2.3 :003 > Faker::Lorem.sentence(3)
 => "Fugiat odio harum." 
2.2.3 :004 > Faker::Lorem.words
 => ["consequuntur", "labore", "optio"] 
2.2.3 :005 > Faker::Lorem.word
 => "error" 
2.2.3 :006 > 

But if you are not able to add external gem you can create your own array/dictionary

2.2.3 :013 > dict
 => ["Editors", "and", "critics", "of", "the", "plays", "disdaining", "the", "showiness", "and", "melodrama", "of", "Shakespearean", "stage", "representation", "began", "to", "focus", "on", "Shakespeare", "as", "a", "dramatic", "poet", "to", "be", "studied", "on", "the", "printed", "page", "rather", "than", "in", "the", "theatre", "The", "rift", "between", "Shakespeare", "on", "the", "stage", "and", "Shakespeare", "on", "the", "page", "was", "at", "its", "widest", "in", "the", "early", "19th", "century", "at", "a", "time", "when", "both", "forms", "of", "Shakespeare", "were", "hitting", "peaks", "of", "fame", "and", "popularity", "theatrical"] 
2.2.3 :014 > dict.sample
 => "the" 
2.2.3 :015 > dict.sample
 => "a" 
2.2.3 :016 > dict.sample
 => "disdaining" 
2.2.3 :017 > dict.sample
 => "century" 
2.2.3 :018 > 

that dictionary was created doing a copy paste from wikipedia's text to my own irb and then scanning all /w+/

2.2.3 :023 > dict='n his own time, William Shakespeare (1564–1616) was rated as merely one among many talented playwrights and poets, but since the late 17th century he has been considered the supreme playwright and poet of the English language.'
 => "n his own time, William Shakespeare (1564–1616) was rated as merely one among many talented playwrights and poets, but since the late 17th century he has been considered the supreme playwright and poet of the English language." 
2.2.3 :024 > dict.scan(/\w+/)
 => ["n", "his", "own", "time", "William", "Shakespeare", "1564", "1616", "was", "rated", "as", "merely", "one", "among", "many", "talented", "playwrights", "and", "poets", "but", "since", "the", "late", "17th", "century", "he", "has", "been", "considered", "the", "supreme", "playwright", "and", "poet", "of", "the", "English", "language"]
Horacio
  • 2,865
  • 1
  • 14
  • 24
  • 2
    `without using an external gem` – 13aal May 05 '16 at 13:41
  • So You can create an array (from your own dictionary) and do something like ["word1","word2","word3"].sample, and you can copy a dictionary from internet https://www.randomlists.com/random-words or just copy from linux file. – Horacio May 05 '16 at 14:07
  • I've improved my answer . – Horacio May 05 '16 at 14:29
2

You can generate pronounceable, but meaningless, words by alternating vowels and consonants: tifa zakohu ayanipico wis kicevepys ijoxar uhiq ilay og luh tanise rijux tejod kuyasoq zov wu

rossum
  • 15,344
  • 1
  • 24
  • 38
1

You can use a random number of words from the dictionary in unix systems. You can usually find it at the path /usr/share/dict/words

Ursus
  • 29,643
  • 3
  • 33
  • 50
  • I probably should of mentioned I'm on Windows. – 13aal May 05 '16 at 12:40
  • [How to get english language word database?](http://stackoverflow.com/questions/2213607/how-to-get-english-language-word-database) – Alex K. May 05 '16 at 12:41
  • For a list of English words (and some non-words) have a look at the [Diceware](http://world.std.com/~reinhold/diceware.html) site. That has two word lists, the Diceware list and the Beale list. – rossum May 05 '16 at 21:49
-1

I found another way to use a dictionary list, if you're using Windows and have access to Outlook you can use Outlooks default.DIC file as a word list, this will give you tons of words, all you have to do is copy it over to the program. Reference is here

13aal
  • 1,634
  • 1
  • 21
  • 47