4

I would like to use a website from R. The website is http://soundoftext.com/ where I can download WAV. files with audios from a given text and a language (voice).

There are two steps to download the voice in WAV: 1) Insert text and Select language. And Submit 2) On the new window, click Save and select folder.

Until now, I could get the xml tree, convert it to list and modify the values of text and language. However, I don't know how to convert the list to XML (with the new values) and execute it. Then, I would need to do the second step too.

Here is my code so far:

require(RCurl)
require(XML)
webpage <- getURL("http://soundoftext.com/")
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)
x<-xmlToList(pagetree)
# Inserting word
x$body$div$div$div$form$div$label$.attrs[[1]]<-"Raúl"
x$body$div$div$div$form$div$label$.attrs[[1]]

# Select language
x$body$div$div$div$form$div$select$option$.attrs<-"es"
x$body$div$div$div$form$div$select$option$.attrs 

I have follow this approach but there is an error with "tag".

UPDATED: I just tried to use rvest to download the audio file, however, it does not respond or trigger anything. What am I doing wrong (missing)?

url <- "http://soundoftext.com/"
s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[1]], text="Raúl", lang="es")
attr(f1, "type") <- "Submit"
s[["fields"]][["submit"]] <- f1
attr(f1, "Class") <- "save"

test <- submit_form(s, f1)
Community
  • 1
  • 1
R user
  • 131
  • 3
  • 14
  • You probably will have a better time with `rvest` package and its `html_form` function – GGamba Feb 15 '17 at 15:59
  • Thank you @GGamba. I've modified the post with a code using your recommendation. However, it still does not work. What am I doing wrong? – R user Feb 15 '17 at 16:57

2 Answers2

1

I see nothing wrong with your approach and it was worth a try.. that's what I'd write too.
The page is somewhat annoying in that uses jquery to append new divs at each request. I still think that should be possible to do with rvest, but I found a fun workaround using the httr package:

library(httr)    

url <- "http://soundoftext.com/sounds"

fd <- list(
  submit = "save",
  text = "Banana", 
  lang="es"
)

resp<-POST(url, body=fd, encode="form")
id <- content(resp)$id

download.file(URLencode(paste0("http://soundoftext.com/sounds/", id)), destfile = 'test.mp3')

Essentially when it send the POST request to the server, an ID come back, if we simply GET that id when can download the file.

GGamba
  • 13,140
  • 3
  • 38
  • 47
  • Thank you again @GGamba. It can downloads the audio, however, the file has not length and it cannot be listen from any player. What is the problem? – R user Feb 15 '17 at 19:57
  • Apologies, I haven't tested it. the Url the mp3 file resides is different, try this: `download.file(URLencode(paste0("http://soundoftext.com/static/sounds/", fd[['lang']], '/', fd[['text']], '.mp3')), 'test.txt')` – GGamba Feb 15 '17 at 20:05
  • It works! Thank you. BTW, do you know how to solve the words with ` and ´ common in some languages? – R user Feb 15 '17 at 20:29
  • pls consider upvoting – GGamba Feb 15 '17 at 20:30
  • cannot open URL 'http://soundoftext.com/static/sounds/it/Ra%FAl.mp3': HTTP status was '404 NOT FOUND' This is trying "Raúl" as text – R user Feb 15 '17 at 20:36
  • I got `trying URL 'http://soundoftext.com/static/sounds/it/Ra%C3%BAl.mp3'`. It's probably a `locale` problem. That's always a headache for me from a non-english country. Try `Sys.setlocale('LC_ALL', 'C')` but I can't help much more, may be worth a new question- – GGamba Feb 15 '17 at 20:43
  • I'll check the configuration. Thank you. – R user Feb 15 '17 at 20:53
0

Creator of Sound of Text here. Sorry it took so long for me to find this post.

I just redesigned Sound of Text, so your html parsing probably won't work anymore. However, there is now an API that you can use which should make things considerably easier for you.

You can find the documentation here: https://soundoftext.com/docs

I apologize if it's not very good. Please let me know if you have any questions.

florabtw
  • 193
  • 1
  • 8