I have a df in R that I have loaded using:
data <- fread("Data/LuminateDataExport_UTDP2_011818.csv", colClasses = 'character', stringsAsFactors = FALSE)
I did this because I had to perform certain operations like stripping "$", etc.
Now, I am trying to convert the columns into the appropriate formats without having to as._
each column individually...
The structure of the current df is:
> str(data)
Classes ‘data.table’ and 'data.frame': 196879 obs. of 32 variables:
$ city : chr "" "" "" "" ...
$ company_goal : chr "" "" "" "" ...
$ company_name : chr "" "" "" "" ...
$ event_date : chr "5/14/2016" "9/26/2015" "9/12/2015" "6/3/2017" ...
$ event_year : chr "FY 2016" "FY 2016" "FY 2016" "FY 2017" ...
$ fundraising_goal : chr "250" "200" "350" "0" ...
$ name : chr "Heart Walk 2015-2016 St. Louis MO" "Heart Walk 2015-2016 Canton, OH" "Heart Walk 2015-2016 Dallas, TX" "FDA HW 2016-2017 Albany, NY WO-65355" ...
$ participant_id : chr "2323216" "2273391" "2419569" "4088558" ...
$ state : chr "" "OH" "TX" "" ...
$ street : chr "" "" "" "" ...
$ team_average : chr "176" "123" "306" "47" ...
$ team_captain : chr "No" "No" "Yes" "No" ...
$ team_count : chr "7" "6" "4" "46" ...
$ team_id : chr "152788" "127127" "45273" "179207" ...
$ team_member_goal : chr "0" "0" "0" "0" ...
$ team_name : chr "Team Clayton" "Cardiac Crusaders" "BIS - Team Myers" "Independent Walkers" ...
$ team_total_gifts : chr "1,230 " "738" "1,225 " "2,145 " ...
$ zip : chr "" "" "" "" ...
$ gifts_count : chr "2" "1" "2" "1" ...
$ registration_gift: chr "No" "No" "No" "No" ...
$ participant_gifts: chr "236" "218" "225" "0" ...
$ personal_gift : chr "0" "0" "0" "250" ...
$ total_gifts : chr "236" "218" "225" "250" ...
$ match_code : chr "UX000" "UX000" "UX000" "UX000" ...
$ tap_level : chr "X" "X" "X" "X" ...
$ tap_desc : chr "" "" "" "" ...
$ tap_lifed : chr "" "" "" "" ...
$ medage_cy : chr "0" "0" "0" "0" ...
$ divindx_cy : chr "0" "0" "0" "0" ...
$ medhinc_cy : chr "0" "0" "0" "0" ...
$ meddi_cy : chr "0" "0" "0" "0" ...
$ mednw_cy : chr "0" "0" "0" "0" ...
- attr(*, ".internal.selfref")=<externalptr>
Now, as a first step- I am trying to convert all of the numbers to_numeric
.
I have tried every one of the solutions found here but none of them have worked.
The errors I keep getting are:
Error in
[.data.table
(data, , cols) : j (the 2nd argument inside [...]) is a single symbol but column name 'cols' is not found. Perhaps you intended DT[,..cols] or DT[,cols,with=FALSE]. This difference to data.frame is deliberate and explained in FAQ 1.1.
AND
Error in
[.data.table
(data, cols) : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.
Here some more info on the data:
> dput(data[1:6, 1:11])
structure(list(city = c("", "", "", "", "", ""), company_goal = c("",
"", "", "", "", ""), company_name = c("", "", "", "", "", ""),
event_date = c("5/14/2016", "9/26/2015", "9/12/2015", "6/3/2017",
"5/6/2017", "10/17/2015"), event_year = c("FY 2016", "FY 2016",
"FY 2016", "FY 2017", "FY 2017", "FY 2016"), fundraising_goal = c("250",
"200", "350", "0", "0", "100"), name = c("Heart Walk 2015-2016 St. Louis MO",
"Heart Walk 2015-2016 Canton, OH", "Heart Walk 2015-2016 Dallas, TX",
"FDA HW 2016-2017 Albany, NY WO-65355", "FDA HW 2016-2017 New Haven, CT WO-66497",
"Heart Walk 2015-2016 Puget Sound, WA"), participant_id = c("2323216",
"2273391", "2419569", "4088558", "4527010", "2424207"), state = c("",
"OH", "TX", "", "", "WA"), street = c("", "", "", "", "",
""), team_average = c("176", "123", "306", "47", "0", "97"
)), .Names = c("city", "company_goal", "company_name", "event_date",
"event_year", "fundraising_goal", "name", "participant_id", "state",
"street", "team_average"), class = c("data.table", "data.frame"
), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x10200c378>)
Suggestions please?
(Once I do this, I will also have to convert different columns to factors, etc)