Extract specific columns and rows from an R dataset creating a new excell file

Question

I'm writing the thesis for my master's degree I should create time series from the covid data of my country, Italy. The data is taken from here: dataset italy 'https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv" from this dataset I would like to have a new excel file that contains only the columns with the date, the denomination of the region, and the total number of deaths. From here I then go on to create the time series for every single Italian region. I hope I explained myself well, thank you very much for your help. the program used R studio

Take a look at `utils::read.csv()` and `xlsx::write.xlsx()`. To extract specific columns you can use something like `df <- df[, c('col1', 'col2')]`. — tivd, Feb 13 '22 at 17:30
If you're at the point of working with R for your master's thesis, I'd assume you have some familiarity with it already. You're trying to subset data by column names—what have you tried that didn't work? For those of us who don't speak Italian, it's unclear which columns of your data you're trying to keep — camille, Feb 13 '22 at 17:39
Also, I would gently suggest to not return this data to excel for analysis (that's why I didn't include an approach to export to excel (see @Mel G's nice answer for that). You can more efficiently do the time series analysis directly in R — langtang, Feb 13 '22 at 18:02

score 1 · Answer 1 · answered Feb 13 '22 at 17:51

library(data.table)
url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-regioni/dpc-covid19-ita-regioni.csv"
data = fread(url)[,.(data,denominazione_regione,deceduti)]

Ouput

> data
                      data denominazione_regione deceduti
    1: 2020-02-24 18:00:00               Abruzzo        0
    2: 2020-02-24 18:00:00            Basilicata        0
    3: 2020-02-24 18:00:00              Calabria        0
    4: 2020-02-24 18:00:00              Campania        0
    5: 2020-02-24 18:00:00        Emilia-Romagna        0
   ---                                                   
15137: 2022-02-13 17:00:00               Sicilia     9036
15138: 2022-02-13 17:00:00               Toscana     8637
15139: 2022-02-13 17:00:00                Umbria     1692
15140: 2022-02-13 17:00:00         Valle d'Aosta      514
15141: 2022-02-13 17:00:00                Veneto    13560

You have been of tremendous help, thank you very much. But if I now want a new dataset with the columns representing the regions and with the first column representing the date of the death. While in the lines we have all the deaths confirmed by date and region, what could I do? — Blancos95, Feb 15 '22 at 11:51
`dcast(data, data~denominazione_regione, value.var="deceduti")` — langtang, Feb 15 '22 at 12:00

score 1 · Answer 2 · answered Feb 13 '22 at 17:53

1

How about something like this?

#Import file
csv <- read.csv("dpc-covid19-ita-regioni.csv")

#Subset for three columns of interest and save to new object
csv2 <- csv[,c("data", "denominazione_regione", "deceduti")]

#Save to new Excel file 
library(openxlsx)
write.xlsx(csv2, 'name-of-your-excel-file.xlsx')

answered Feb 13 '22 at 17:53

Mel G

132
1
10

You have been of tremendous help, thank you very much. But if I now want a new dataset with the columns representing the regions and with the first column representing the date of the death. While in the lines we have all the deaths confirmed by date and region, what could I do? – Blancos95 Feb 15 '22 at 11:49
@Blancos95 Are you all set, or do you still need help? – Mel G Feb 15 '22 at 13:14

Extract specific columns and rows from an R dataset creating a new excell file

2 Answers2