I'm writing the thesis for my master's degree I should create time series from the covid data of my country, Italy. The data is taken from here: dataset italy 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv" from this dataset I would like to have a new excel file that contains only the columns with the date, the denomination of the region, and the total number of deaths. From here I then go on to create the time series for every single Italian region. I hope I explained myself well, thank you very much for your help. the program used R studio
Asked
Active
Viewed 1,259 times
-2
-
2Take a look at `utils::read.csv()` and `xlsx::write.xlsx()`. To extract specific columns you can use something like `df <- df[, c('col1', 'col2')]`. – tivd Feb 13 '22 at 17:30
-
2If you're at the point of working with R for your master's thesis, I'd assume you have some familiarity with it already. You're trying to subset data by column names—what have you tried that didn't work? For those of us who don't speak Italian, it's unclear which columns of your data you're trying to keep – camille Feb 13 '22 at 17:39
-
Also, I would gently suggest to not return this data to excel for analysis (that's why I didn't include an approach to export to excel (see @Mel G's nice answer for that). You can more efficiently do the time series analysis directly in R – langtang Feb 13 '22 at 18:02
2 Answers
1
library(data.table)
url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv"
data = fread(url)[,.(data,denominazione_regione,deceduti)]
Ouput
> data
data denominazione_regione deceduti
1: 2020-02-24 18:00:00 Abruzzo 0
2: 2020-02-24 18:00:00 Basilicata 0
3: 2020-02-24 18:00:00 Calabria 0
4: 2020-02-24 18:00:00 Campania 0
5: 2020-02-24 18:00:00 Emilia-Romagna 0
---
15137: 2022-02-13 17:00:00 Sicilia 9036
15138: 2022-02-13 17:00:00 Toscana 8637
15139: 2022-02-13 17:00:00 Umbria 1692
15140: 2022-02-13 17:00:00 Valle d'Aosta 514
15141: 2022-02-13 17:00:00 Veneto 13560

langtang
- 22,248
- 1
- 12
- 27
-
You have been of tremendous help, thank you very much. But if I now want a new dataset with the columns representing the regions and with the first column representing the date of the death. While in the lines we have all the deaths confirmed by date and region, what could I do? – Blancos95 Feb 15 '22 at 11:51
-
1
How about something like this?
#Import file
csv <- read.csv("dpc-covid19-ita-regioni.csv")
#Subset for three columns of interest and save to new object
csv2 <- csv[,c("data", "denominazione_regione", "deceduti")]
#Save to new Excel file
library(openxlsx)
write.xlsx(csv2, 'name-of-your-excel-file.xlsx')

Mel G
- 132
- 1
- 10
-
You have been of tremendous help, thank you very much. But if I now want a new dataset with the columns representing the regions and with the first column representing the date of the death. While in the lines we have all the deaths confirmed by date and region, what could I do? – Blancos95 Feb 15 '22 at 11:49
-