2

I am importing an excel file from the Robert Koch Institute into R using readxl::read_excel. However, I am experiencing problems with a row that contains only dates. The date format is DD.MM.YYYY, but in the resulting data frame, they appear as seemingly arbitrary numeric values. Here is the code that I am using:

library(readxl)
df <- read_excel("path/to/Fallzahlen_Kum_Tab_Archiv.xlsx", 
                                        sheet = "BL_7-Tage-Fallzahlen (fixiert)")
head(df)

which returns the following output:

# A tibble: 6 x 494
  `Fallzahlen in d~  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12 ...13 ...14 ...15
  <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA                   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
2 NA                   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
3 NA                   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
4 NA                43957 43958 43959 43960 43961 43962 43963 43964 43965 43966 43967 43968 43969 43970
5 Baden-Württemberg   908   779   665   706   772   760   774   744   765   725   622   551   549   483
6 Bayern             1318  1246  1182  1223  1148  1151  1198  1194  1115  1022  1022   997   982   921
# ... with 479 more variables: ...16 <dbl>, ...

In row number 4, you can see the dates that are clearly not imported correctly. I already tried col_types="text" and col_types="list", as this suggested in other posts, but this doesn't seem to change anything.

The excel file can be downloaded in the "Archiv" section of RKI's webpage and this should be the direct download link.

Lukas D. Sauer
  • 355
  • 2
  • 11

1 Answers1

2

You can fix the data frame after loading it into R

# load data, replace ... with path and file
# dat <- read_xlsx(..., sheet="BL_7-Tage-Fallzahlen (fixiert)")
# get the dates
dates <- as.numeric(dat[4,])

# filter top rows with NAs
dat <- dat[5:nrow(dat),]

# put the dates as column names
colnames(dat)[-1] <- as.character(as.Date(dates[-1], origin="1899-12-30"))

head(dat)
# A tibble: 6 × 494
  `Fallzahlen in den letzte… `2020-05-06` `2020-05-07` `2020-05-08` `2020-05-09`
  <chr>                             <dbl>        <dbl>        <dbl>        <dbl>
1 Baden-Württemberg                   908          779          665          706
2 Bayern                             1318         1246         1182         1223
3 Berlin                              276          265          272          280
4 Brandenburg                         125          116          117          131
5 Bremen                              119          126          140          155
6 Hamburg                              95           85           86           89
# … with 489 more variables: 2020-05-10 <dbl>, 2020-05-11 <dbl>,
#   2020-05-12 <dbl>, 2020-05-13 <dbl>, 2020-05-14 <dbl>, 2020-05-15 <dbl>,
#   2020-05-16 <dbl>, 2020-05-17 <dbl>, 2020-05-18 <dbl>, 2020-05-19 <dbl>,
#   2020-05-20 <dbl>, 2020-05-21 <dbl>, 2020-05-22 <dbl>, 2020-05-23 <dbl>,
#   2020-05-24 <dbl>, 2020-05-25 <dbl>, 2020-05-26 <dbl>, 2020-05-27 <dbl>,
#   2020-05-28 <dbl>, 2020-05-29 <dbl>, 2020-05-30 <dbl>, 2020-05-31 <dbl>,
#   2020-06-01 <dbl>, 2020-06-02 <dbl>, 2020-06-03 <dbl>, 2020-06-04 <dbl>, …
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29