I am trying to perform some analysis on the Online New Popularity dataset from the UCI Open Data Repo here: https://archive.ics.uci.edu/ml/datasets/online+news+popularity
The dataset has a set of 7 boolean attributes that denote the day of the week that the article was published on. For example, the column weekday_is_monday
will have the value 1
if the article was published on a Monday and so on. For my analysis, I am trying to merge these fields into a single field that contains the string literal of publishing day.
So I load this dataset then go through and replace each true value with the string literal:
news <- read.csv("path_to_my_dataset",
header=TRUE,
sep=",",
fill=F,
strip.white = T,
stringsAsFactors=FALSE)
news$weekday_is_monday <- gsub('^1', 'Monday', news$weekday_is_monday)
news$weekday_is_tuesday <- gsub('^1', 'Tuesday', news$weekday_is_tuesday)
news$weekday_is_wednesday <- gsub('^1', 'Wednesday', news$weekday_is_wednesday)
news$weekday_is_thursday <- gsub('^1', 'Thusday', news$weekday_is_thursday)
news$weekday_is_friday <- gsub('^1', 'Friday', news$weekday_is_friday)
news$weekday_is_saturday <- gsub('^1', 'Saturday', news$weekday_is_saturday)
news$weekday_is_sunday <- gsub('^1', 'Sunday', news$weekday_is_sunday)
Next I found a solution in this thread that used the dpyler::coalesce
function to merge all the fields. I adapted this to my dataset as follows:
news <- news %>% mutate_at(vars(starts_with("weekday_is")), funs(na_if(.,"0"))) %>%
mutate(news, publishing_day = coalesce(weekday_is_monday, weekday_is_tuesday, weekday_is_wednesday, weekday_is_thursday,
weekday_is_friday, weekday_is_saturday, weekday_is_sunday))
news$publishing_day <- as.factor(news$publishing_day)
summary(news$publishing_day)
However, this only merges the fields from the first column (i.e. Monday):
0 Monday
32983 6661
Where am I going wrong here?