1

Why I can't I read the downloaded file in readLines? How can I read it?

url="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
txt=download.file(url,destfile="stock")
> file1=readLines("stock",encoding="big5")
Warning messages:
1: In readLines("stock", encoding = "big5") :
invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "big5") :
incomplete final line found on 'stock'
> file1=readLines("stock",encoding="gbk")
Warning messages:
1: In readLines("stock", encoding = "gbk") :
invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "gbk") :
incomplete final line found on 'stock'
> file1=readLines("stock",encoding="gb2132")
Warning messages:
1: In readLines("stock", encoding = "gb2132") :
invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "gb2132") :
incomplete final line found on 'stock'
> file1=readLines("stock",encoding="gb18030")
Warning messages:
1: In readLines("stock", encoding = "gb18030") :
 invalid input found on input connection 'stock'
2: In readLines("stock", encoding = "gb18030") :
incomplete final line found on 'stock'

The file contains only part of contents, many of contents lost, why?

Dd Pp
  • 5,727
  • 4
  • 21
  • 19
  • 2
    I realize that English may not be your first language, but please in the future try to pay more attention to your spelling and grammar. – joran Sep 06 '12 at 03:28
  • 2
    Try opening the file 'stock' in a text editor. It's possible this is an encoding issue: try reading the help page for readlines to see how to control encoding. To understand encoding, see any of the thousands of pages about it here on stackoverflow. – Alex Brown Sep 06 '12 at 03:29

1 Answers1

0

The file contains 18 lines, and my R reads all of these 18 lines. I suspect that you're trying to ignore the difference between a text file and an HTML file. To extract the HTML table, you'll need to use something like this.

Community
  • 1
  • 1
themel
  • 8,825
  • 2
  • 32
  • 31
  • when you download the file ,open it, the file contains at least 1800 lines. – Dd Pp Sep 06 '12 at 07:17
  • Nope, there are precisely 18 lines in that file when it's interpreted as text. The rest is HTML - http://en.wikipedia.org/wiki/HTML – themel Sep 06 '12 at 08:26
  • 1
    ok ,the stock file is a html file ,now ,i want to read it as a text with `readLines("stock")`,why i can't? – Dd Pp Sep 06 '12 at 09:49
  • You can't read it as a text file because it IS not a text file - see the link in my answer on how to extract data from HTML. – themel Sep 06 '12 at 10:45