-1

As a beginner, I am currently working with web scraping with R, using the 'rvest' package. My goal is to get the lyrics of any song from 'www.musixmatch.com'. This is my attempt:

library(rvest)
url <- "https://www.musixmatch.com/lyrics/Red-Hot-Chili-Peppers/Can-t-Stop"
musixmatch <- read_html(url)
lyrics <- musixmatch%>%html_nodes(".mxm-lyrics__content")%>%html_text()

This code creates a vector 'lyrics' with 2 rows, containing the lyrics:

[1] "Can't stop addicted to the shindig\nChop top he says I'm gonna win big\nChoose not a life of imitation" 
[2] "Distant cousin to the reservation\n\nDefunkt the pistol that you pay for\nThis punk the feeling that you stay for\nIn time I want to be your best friend\nEastside love is living on the Westend\n\nKnock out but boy you better come to\nDon't die you know the truth is some do\nGo write your message on the pavement\nBurn so bright I wonder what the wave meant\n\nWhite heat is screaming in the jungle\nComplete the motion if you stumble\nGo ask the dust for any answers\nCome back strong with 50 belly dancers\n\nThe world I love\nThe tears I drop\nTo be part of\nThe wave can't stop\nEver wonder if it's all for you\nThe world I love\nThe trains I hop\nTo be part of\nThe wave can't stop\n\nCome and tell me when it's time to\n\nSweetheart is bleeding in the snow cone\nSo smart she's leading me to ozone\nMusic the great communicator\nUse two sticks to make it in the nature\nI'll get you into penetration\nThe gender of a generation\nThe birth of every other nation\nWorth your weight the gold ... <truncated>

The problem is that the 2nd row gets truncated at some point. From what I know about rvest, there is no parameter to adjust truncation. Also, I could not find anything on the internet about this issue. Does anybody know how to adjust/ disable truncation for this feature? Thanks a lot in advance!

Best regards,

Jan

  • 1
    Is it truncated itself, or just the print display? Try writing to to a text file so you can see it in full. – sebastian-c Feb 17 '17 at 14:58
  • 1
    Actually, maybe this would solve your problem?: http://stackoverflow.com/questions/36800475/avoid-string-printed-to-console-getting-truncated-in-rstudio – sebastian-c Feb 17 '17 at 14:59
  • I can't reproduce the problem. – Dason Feb 17 '17 at 15:01
  • @sebastian-c, i cannot believe that it was that simple: thank you very much! – Jan-Benedikt Jagusch Feb 17 '17 at 15:04
  • _"you agree that you will not: modify, adapt, translate, or reverse engineer any portion of Musixmatch or its contents, or use any robot, spider, site search/retrieval application, or other device to retrieve or index any portion of the Website and/or Application"_. Does no one on SO read EULAs/ToC/ToS? – hrbrmstr Feb 18 '17 at 02:42

1 Answers1

-1

I think its better to copy and paste the lyrics into your Notepad or Wordpad. Save as a .txt file

Then use the readLines function, it prints our a warning message but I was able to have the entire lyrics in 84x1 chacacter vector which you can clean or do whatever you please.

words <- readLines("redhot.txt")
> head(words)
  [1] "Can't stop addicted to the shindig"     
  [2] "Chop top he says I'm gonna win big"     
  [3] "Choose not a life of imitation"         
  [4] "Distant cousin to the reservation"      
  [5] "Defunkt the pistol that you pay for"    
  [6] "This punk the feeling that you stay for"

No truncation problem here.

mikeymike
  • 75
  • 5