I want to clean repeated lines in a HTML in R

Asked Nov 10 '17 at 18:27

Active Nov 10 '17 at 21:29

Viewed 25 times

I want clean repeated lines in a HTML in R and I have already ended to this. I only want to keep the country names. What is the pattern?

mypage = readLines('http://www.worldslongestwebsite.com')
write(mypage, ("Raw Data.txt"))
mypage[1:1000]
grep('currentVisitor',mypage)
mypage[230:1000]
text<- toString(mypage[230:1000]) 
text
cleantext<- gsub(pattern="[\"\\<\\>\\=/,:-][0-9]*",replacement= " ",text)

result of a couple of lines

p class  c  Brazil  p   div                                                    div class  d   p class  a      p  p class  b     PM  p                        p class  c  Albania  p   div                                                    div class  d   p class  a      p  p class  b     PM  p                        p class  c  India  p   div                                                    div class  d   p class  a      p  p class  b     PM  p

edited Nov 10 '17 at 21:29

mplungjan

169,008
28
173
236

asked Nov 10 '17 at 18:27

Linda Tolis

1

Please show us your input which produces the result. And please format your code properly (4 indents at least). – Heri Nov 10 '17 at 18:30
1

It's been awhile since we've been able to bump this answer: https://stackoverflow.com/a/1732454/1531971 – Nov 10 '17 at 18:33
Since you probably still want to do this even though all is (probably) lost, and the Elder Gods have been summoned: https://www.r-bloggers.com/string-functions-in-r/ – Nov 10 '17 at 18:36
What is "I have ended to this" ? and why not try harder showing us code and expected output? – mplungjan Nov 10 '17 at 21:30

I want to clean repeated lines in a HTML in R

0 Answers0