I am using Kaggles gun violence dataset. My goal is to use Tableau for a interactive visualization for some of the regions and specifics relating to gun crimes there. My goal is to turn this dataframe into tidy format. Link:
https://www.kaggle.com/jameslko/gun-violence-data/version/1
With that being the case, there are a couple columns formatted like this that I am having issues wrangling in R. There are around 20 or so columns, these 4 are formatted like this:
A little background: there can be more than one gun involved in a crime, and more than one participant. Due to this, these columns contain information for each gun/participant split by '||'. The 0:, 1: ... indicates details for that specific gun/participant.
My goal is to capture the unique instances in each column and disregard the 0:, 1:, 2:, ...
Here is my code so far:
df= read.csv("C:/Users/rmahesh/Desktop/gun-violence-data_01-2013_03-2018.csv")
df$incident_id = NULL
df$incident_url = NULL
df$source_url = NULL
df$participant_name = NULL
df$participant_relationship = NULL
df$sources = NULL
df$incident_url_fields_missing = NULL
df$participant_status = NULL
df$participant_age_group = NULL
df$participant_type = NULL
df$incident_characteristics = NULL
#Subset of columns with formatting issues:
df2 = df[, c('gun_stolen', 'gun_type', 'participant_age', 'participant_gender')]
I have yet to run into an issue like this, and would love any help figuring out how to solve my problem. Any help would be greatly appreciated!
Edit1: I have created the first 3 rows of the columns in question. The format is identical more or less with some columns missing at times:
gun_stolen,gun_type,participant_age,participant_gender
0::Unknown||1::Unknown, 0::Unknown||1::Unknown, 0::25||1::31||2::33||3::34||4::33, 0::Male||1::Male||2::Male||3::Male||4::Male
0::Unknown||1::Unknown,0::22 LR||1::223 Rem [AR-15],0::51||1::40||2::9||3::5||4::2||5::15,0::Male||1::Female||2::Male||3::Female||4::Female||5::Male
0::Unknown,0::Shotgun,3::78||4::48,0::Male||1::Male||2::Male||3::Male||4::Male