0

I have tried my best searching for similar questions but couldn't find exactly what I was looking for. I have a data set on crime statistics per province in the Netherlands spanning 2012-2021. The data that I got from the police has all of the different crimes registered in one single column and gives that data per month of every year. For my analysis I would actually like to use every crime as a variable so each crime type needs to be in its own columns with the right amount of registered crimes. I am absolutely stumped as to how to transpose this data so that it works.

In case I am not clear with what I mean here an example.

My data looks like this at the moment:

      ID crime_name  Province Year Month Reg_crimes
1  1      Total Groningen 2012    01       2913
2  2      Total Groningen 2012    02       2399
3  3      Total Groningen 2012    03       2870
4  4      Total Groningen 2012    04       2649
5  5      Total Groningen 2012    05       2891
6  6      Total Groningen 2012    06       2710

With crime_name being the 48 different types of crime (for this example it just gives me Total but there are a lot more), and Reg_crimes being the amount of registered crimes for that type in that certain month.

Ideally I would want it to look like this:

  ID  Province Year Month      Total  Type_1  Type_2 Type_3 ....... Type_48
1  1 Groningen 2012    01       2913
2  2 Groningen 2012    02       2399
3  3 Groningen 2012    03       2870
4  4 Groningen 2012    04       2649
5  5 Groningen 2012    05       2891
6  6 Groningen 2012    06       2710

I hope this is clear, I am having a bit of difficulty trying to put it into words, but I will happily answer all questions in case there is some confusion.

Here is a sample of my data:

ds <- structure(list(ID = 1:6, crime_name = c("Total", "Total", "Total", 
"Total", "Total", "Total"), Province = c("Groningen", "Groningen", 
"Groningen", "Groningen", "Groningen", "Groningen"), Year = c("2012", 
"2012", "2012", "2012", "2012", "2012"), Month = c("01", "02", 
"03", "04", "05", "06"), Reg_crimes = c("    2913", "    2399", 
"    2870", "    2649", "    2891", "    2710")), row.names = c(NA, 
6L), class = "data.frame")

1 Answers1

1

Here is a tidyr solution that also uses the new base R pipe operator.

ds |> tidyr::pivot_wider(
  id_cols = -c(crime_name, Reg_crimes),
  names_from = crime_name,
  values_from = Reg_crimes
)
## A tibble: 6 x 5
#     ID Province  Year  Month Total     
#  <int> <chr>     <chr> <chr> <chr>     
#1     1 Groningen 2012  01    "    2913"
#2     2 Groningen 2012  02    "    2399"
#3     3 Groningen 2012  03    "    2870"
#4     4 Groningen 2012  04    "    2649"
#5     5 Groningen 2012  05    "    2891"
#6     6 Groningen 2012  06    "    2710"

Without the pipe:

tidyr::pivot_wider(
  data = ds,
  id_cols = -c(crime_name, Reg_crimes),
  names_from = crime_name,
  values_from = Reg_crimes
)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • This is almost 100% right. I get the crime types into the columns. But only total get the values (all other columns get NA). –  Sep 11 '21 at 14:32
  • @Simba63 Can you post a data set with other crime types? – Rui Barradas Sep 11 '21 at 14:46
  • I could send you my data set and then you could take a look? –  Sep 11 '21 at 14:52
  • I tried doing it by removing the ID and then using this `ds3 <- pivot_wider(ds3, names_from = crime_name, values_from = Reg_crimes, values_fn = list)`. This works only then I get `c("160", "15")` in one of the columns. –  Sep 11 '21 at 15:02
  • Nevermind, this is a fault in my data set. Thank you so much for your help! –  Sep 11 '21 at 15:07