Data wrangling large database divided by year

Question

I am trying to create a vertical data frame from a really large crime database arranged horizontally. Its first 5 columns describe the monthly data which is presented in the following 5:16 columns. It is a yearly database, so underneath the data for a certain year is the data for the next year. What I want to do is arrange all data for a specific crime for a specific state for all years in vertical columns.

Database to arrange head

Since some rows of data are not complete I have to add an if statement that checks, since otherwise I would have to run the code and solve each problem manually until it worked.

Edit: I am trying to find a way to post the csv file containing the data, yet my stack overflow skills are pretty new. I am adding a link that contains the csv file. https://drive.google.com/file/d/1JU-mYOh68bzSHUF6weT0io1xVuNCgXFd/view?usp=share_link

I created a code like this:

## I added these base modifications to the df
df_15<-read.csv("Incidencia_municipal_2015_2022.csv", fileEncoding = "latin1")
df_15<-df_15%>%select(-colnames(df_15)[c(4,6)])%>%filter(Año<2022)%>%unite(col="Entidad", colnames(df_15)[c(3,5)])%>%unite(col="Tipo",colnames(df_15)[7:9])
df_15$Tipo<-sub(" ", "-", df_15$Tipo)
df_15$Entidad<-sub(" ", "-", df_15$Entidad) 
## 

tn<-unique(df_15$Tipo)
glos2<-data.frame(seq(1:98),tn)
ts<-rep(seq(1:98), times=16219)
df_15$Tipo<-ts
sn<-unique(df_15$Entidad)
cnt<-0
na<-c()

for(m in 1:length(sn)){
  if(m==1){
    abs<-matrix(,nrow = 84,ncol=242844)
    nmsx<-seq(1:242844)
  }
  
  for (i in 1:length(tn)) {
      agsx<-filter(df_15, Entidad==sn[m], Tipo==i)%>%select(colnames(df_15)[5:16])
      agsx<-c(as.matrix(agsx))
      t<-(m-1)*length(tn)+i
      if(length(agsx)==nrow(abs)){
        abs[,t]<-agsx } else{ 
          cnt<-cnt+1
          na[cnt]<-m
        } 
      
      nmsx[t]<-paste(m,i, sep="_")
    }
  
  
  if(m==2466){
    data_2015m<-as_tibble(abs)
    colnames(data_2015m)<-nmsx
  }
  
}

The code works. Yet, it takes forever to do its work. Is there an easier way to do this?

Welcome to StackOverflow. Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by providing your data using `dput(df_15)` or `dput(head(df_15))`? Please avoid posting images of code or data. — jrcalabrese, Nov 22 '22 at 16:26
It seems like you're trying to hard-code a pivot from wide format to long format. Instead I'd suggest looking at the FAQ [How to pivot data from wide to long?](https://stackoverflow.com/q/2185252/903061) There are quite a few methods there, I'd recommend `pivot_longer`. If you need more help than that, please do add some data we can actually work with - it's difficult to demonstrate code on a picture. — Gregor Thomas, Nov 22 '22 at 16:39
I added an edit. Maybe that can be useful. Thanks for the help! — emithemex, Nov 23 '22 at 17:23

Data wrangling large database divided by year

0 Answers0