-1

I am looking to read a .txt file from a URL. I run the following:

readLines(paste0("https://www.sec.gov/Archives/", All_file_today[Var], sep = ""))

Given that All_file_today[var] contains the following Url: 'edgar/data/99189/0001567619-22-004329.txt'

But it returns the error:

Error in file(con, "r") : 
  cannot open the connection to 'https://www.sec.gov/Archives/edgar/data/99189/0001567619-22-004329.txt'

When i copy this weblink and paste it in a web browser, it shows the content that I am looking for just clear. Anyone knows what i am not doing right please ?

Following the feedback from Nad below, I run the following:

> user <- paste('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7), AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.83 Safari/537.36')
> res <- GET(url, add_headers(`User-Agent` = user, Connection = 'keep-alive'))
> res
Response [https://www.sec.gov/Archives/edgar/data/1000097/0000919574-15-002406.txt]
  Date: 2022-03-29 01:32
  Status: 200
  Content-Type: text/plain
  Size: 5.44 kB
<SEC-DOCUMENT>0000919574-15-002406.txt : 20150225
<SEC-HEADER>0000919574-15-002406.hdr.sgml : 20150225
<ACCEPTANCE-DATETIME>20150225160223
ACCESSION NUMBER:       0000919574-15-002406
CONFORMED SUBMISSION TYPE:  13F-HR/A
PUBLIC DOCUMENT COUNT:      2
CONFORMED PERIOD OF REPORT: 20141231
FILED AS OF DATE:       20150225
DATE AS OF CHANGE:      20150225
EFFECTIVENESS DATE:     20150225
...
> readLines(content(res))
No encoding supplied: defaulting to UTF-8.
Error in file(con, "r") : cannot open the connection

From the above, I understand that I am able to get to the file, but the readLines does not go through. What could be the reason please ?

Rene Chan
  • 864
  • 1
  • 11
  • 25
  • When I try it, or try to read it with `read_html()` from the `rvest` package, I get a 403 error, suggesting that you may not be able to access these resources this way. – DaveArmstrong Mar 28 '22 at 16:47
  • It is helpful for folks answering questions to provide a reproducible example (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example is a great place to look for other examples). In order to answer your question we can't just run your code because it assumes `All_file_today` and `Var` are in our local search path and they aren't. – Adam Hyland Mar 28 '22 at 18:20
  • HI Adam, thanks a lot for you feedback, i have modified the code a bit. – Rene Chan Mar 29 '22 at 01:16

1 Answers1

2

We can read the file using package httr,

url = 'https://www.sec.gov/Archives/edgar/data/99189/0001567619-22-004329.txt' 

user <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
            'Gecko/20100101 Firefox/98.0')

res <- GET(url, add_headers(`User-Agent` = user, Connection = 'keep-alive'))

readLines(content(res))
Nad Pat
  • 3,129
  • 3
  • 10
  • 20
  • This solves the problem by specifying the user agent, because the request is being filtered by user agent. If someone wants to use readLines() they can change the user agent for R as described at https://stackoverflow.com/a/4537050/1188479 and it will work fine too. – Adam Hyland Mar 28 '22 at 18:00
  • Hi Nad, this is solved. i found another post that recommended restarting Rstudio, as simple as that, together with your instructions and it works thanks again. – Rene Chan Mar 29 '22 at 01:51