0

I need to make a data frame with data from a website that is from last 7 days. I used this code:

d <- readLines(paste(con="https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/%22,format(Sys.Date()-7,%22%25Y%25m%25d%22),%22/data_do/%22,format(Sys.Date(),%22%25Y%25m%25d"), sep = ""))
writeLines(d,con="test.csv")

When I create data frame from this dataset, columns titles are stored in the first row and I need to put it higher.

I tried common

df <- data.frame(d)

But the same problem I wrote above appears.

then

df <- read.table(file="data/test.csv",sep=";",dec=",",header=T,stringsAsFactors=F)

but it seems that it's not saved as a file because R couldn't find the file.

zephryl
  • 14,633
  • 3
  • 11
  • 30
szwagro
  • 13
  • 1
  • The code you provided has multiple issues and doesn't result in a dataframe at all. The code first throws an error due to an extra close paren, then an invalid URL. You can't include R expressions like `format(Sys.Date()-7)` in a string and expect them to be parsed; you would need to list them separately in your `paste()` call, or use something like `stringr::str_glue()`. Also the `con` argument is for `readLines()`, not `paste()`. – zephryl Dec 27 '22 at 19:50
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Dec 27 '22 at 20:08
  • @szwagro, does the answer resolve your question? – r2evans Dec 28 '22 at 00:56
  • It does work, thanks! Sorry for those confusing details but those are beginnings.... All the best dude! – szwagro Dec 28 '22 at 07:13

1 Answers1

0
  1. You have URL encoding (aka percent-encoding), which is both messing with how the code works, and making it difficult to understand your question. If you URLdecode(..) it, you find that not only does the extra-close-paren problem get fixed, you now no longer have format(..) within the string portion.

    cat(URLdecode('d <- readLines(paste(con="https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/%22,format(Sys.Date()-7,%22%25Y%25m%25d%22),%22/data_do/%22,format(Sys.Date(),%22%25Y%25m%25d"), sep = ""))'),'\n')
    # d <- readLines(paste(con="https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/",format(Sys.Date()-7,"%Y%m%d"),"/data_do/",format(Sys.Date(),"%Y%m%d"), sep = "")) 
    
  2. Once we fix this, we can now read the file contents rather simply:

    > d <- readLines(paste("https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/",format(Sys.Date()-7,"%Y%m%d"),"/data_do/",format(Sys.Date(),"%Y%m%d"), sep = "")) 
    > str(d)
     chr [1:190] "Data;Godzina;Generacja \x9fr\xf3de\xb3 wiatrowych;Generacja \x9fr\xf3de\xb3 fotowoltaicznych" "2022-12-20;1;5257,113;0,000" "2022-12-20;2;5257,000;0,000" "2022-12-20;3;5247,488;0,000" "2022-12-20;4;5119,350;0,000" "2022-12-20;5;5028,475;0,000" ...
    

    It looks as if it should be readable by read.csv2 due to the ;-separated fields and ,-decimal indicator. We can try to read it directly instead of working with readLines, writeLines, and then read.csv*.

  3. But that results in an error:

    d <- read.csv2(paste("https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/",format(Sys.Date()-7,"%Y%m%d"),"/data_do/",format(Sys.Date(),"%Y%m%d"), sep = ""))
    # Error in make.names(col.names, unique = TRUE) : 
    #   invalid multibyte string 3
    

    A quick search on SO yields many promising questions (e.g., Invalid multibyte string in read.csv), suggesting the use of fileEncoding=.

Given those fixes, we now have:

d <- read.csv2(paste("https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/",
                     format(Sys.Date()-7, "%Y%m%d"), "/data_do/", format(Sys.Date(), "%Y%m%d"), sep = ""),
               fileEncoding = "latin1")
head(d)
#         Data Godzina Generacja..róde..wiatrowych Generacja..róde..fotowoltaicznych
# 1 2022-12-20       1                    5257.113                                 0
# 2 2022-12-20       2                    5257.000                                 0
# 3 2022-12-20       3                    5247.488                                 0
# 4 2022-12-20       4                    5119.350                                 0
# 5 2022-12-20       5                    5028.475                                 0
# 6 2022-12-20       6                    5077.925                                 0

(My system is not setup well to be very UTF-8 or non-english characters, ergo the .-dots embedded in the column names. I hope they are clearer on your console.)

r2evans
  • 141,215
  • 6
  • 77
  • 149