-1

I am trying to clean up column names in R. I am working with a JSON dataset that I used a jsonlite function called "stream_in" to import into R.

First off, I tried the "gsub" command and the "paste" command but both didn't work.

The problem seems to me like so: when I use the command head to inspect the data, it reads to me all column names even the ones containing "." and, strangely, "spaces" but if I use the names command, it only reads the ones without "dots" or "spaces". Any suggestions? I have columns with names such as

hours.Monday.open attributes.Alcohol

and I would like to remove the "."

I tried something like this

names(restaurant.data)[3] <- paste("HoursMondayOpen")

but that only removed the word before the first "." and the new column name was "HoursMondayOpen.Monday.Open"

I also tried

names(restaurant.data) <- gsub("\.", "", names(restaurant.data))

but that simply didn't change anything, neither did it give me an error.

Does that help?

Here's the output from dput()

> dput(head(restaurant.data))
structure(list(business_id = c("5UmKMjUEUNdYWqANhGckJw", "UsFtqoBl7naz8AVUBZMjQQ", 
"3eu6MEFlq2Dg7bQh8QbdOg", "cE27W9VPgO88Qxe4ol6y_g", "HZdLhv6COCleJMo7nPl-RA", 
"mVHrayjG3uZ_RLHkLj-AMg"), FullAddress = c("4734 Lebanon Church Rd\nDravosburg, PA 15034", 
"202 McClure St\nDravosburg, PA 15034", "1 Ravine St\nDravosburg, PA 15034", 
"1530 Hamilton Rd\nBethel Park, PA 15234", "301 South Hills Village\nPittsburgh, PA 15241", 
"414 Hawkins Ave\nrankin, PA 15104"), HoursFridayClose = structure(list(
    Friday = structure(list(close = c("21:00", NA, NA, NA, "17:00", 
    "20:00"), open = c("11:00", NA, NA, NA, "10:00", "10:00")), .Names = c("close", 
    "open"), row.names = c(NA, 6L), class = "data.frame"), Tuesday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Thursday = structure(list(
        close = c("21:00", NA, NA, NA, "17:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Wednesday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Monday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", NA), open = c("11:00", 
        NA, NA, NA, "10:00", NA)), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Sunday = structure(list(
        close = c(NA, NA, NA, NA, "18:00", NA), open = c(NA, 
        NA, NA, NA, "11:00", NA)), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Saturday = structure(list(
        close = c(NA, NA, NA, NA, "21:00", "16:00"), open = c(NA, 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame")), .Names = c("Friday", 
"Tuesday", "Thursday", "Wednesday", "Monday", "Sunday", "Saturday"
), row.names = c(NA, 6L), class = "data.frame"), open = c(TRUE, 
TRUE, TRUE, FALSE, TRUE, TRUE), categories = list(c("Fast Food", 
"Restaurants"), "Nightlife", c("Auto Repair", "Automotive"), 
    c("Active Life", "Mini Golf", "Golf"), c("Shopping", "Home Services", 
    "Internet Service Providers", "Mobile Phones", "Professional Services", 
    "Electronics"), c("Bars", "American (New)", "Nightlife", 
    "Lounges", "Restaurants")), city = c("Dravosburg", "Dravosburg", 
"Dravosburg", "Bethel Park", "Pittsburgh", "rankin"), review_count = c(4L, 
4L, 3L, 5L, 5L, 20L), name = c("Mr Hoagie", "Clancy's Pub", "Joe Cislo's Auto", 
"Cool Springs Golf Center", "Verizon", "Emil's Lounge"), neighborhoods = list(
    character(0), character(0), character(0), character(0), character(0), 
    character(0)), longitude = c(-79.9007057, -79.8868138, -79.889059, 
-80.0146597, -80.05998, -79.8802474), state = c("PA", "PA", "PA", 
"PA", "PA", "PA"), stars = c(4.5, 3.5, 5, 2.5, 2.5, 5), latitude = c(40.3543266, 
40.3505527, 40.3509559, 40.3541155, 40.35762, 40.4134643), attributes = structure(list(
    `Take-out` = c(TRUE, NA, NA, NA, NA, TRUE), `Drive-Thru` = c(FALSE, 
    NA, NA, NA, NA, NA), `Good For` = structure(list(dessert = c(FALSE, 
    NA, NA, NA, NA, FALSE), latenight = c(FALSE, NA, NA, NA, 
    NA, FALSE), lunch = c(FALSE, NA, NA, NA, NA, TRUE), dinner = c(FALSE, 
    NA, NA, NA, NA, FALSE), brunch = c(FALSE, NA, NA, NA, NA, 
    FALSE), breakfast = c(FALSE, NA, NA, NA, NA, FALSE)), .Names = c("dessert", 
    "latenight", "lunch", "dinner", "brunch", "breakfast"), row.names = c(NA, 
    6L), class = "data.frame"), Caters = c(FALSE, NA, NA, NA, 
    NA, TRUE), `Noise Level` = c("average", NA, NA, NA, NA, "average"
    ), `Takes Reservations` = c(FALSE, NA, NA, NA, NA, FALSE), 
    Delivery = c(FALSE, NA, NA, NA, NA, FALSE), Ambience = structure(list(
        romantic = c(FALSE, NA, NA, NA, NA, FALSE), intimate = c(FALSE, 
        NA, NA, NA, NA, FALSE), classy = c(FALSE, NA, NA, NA, 
        NA, FALSE), hipster = c(FALSE, NA, NA, NA, NA, FALSE), 
        divey = c(FALSE, NA, NA, NA, NA, FALSE), touristy = c(FALSE, 
        NA, NA, NA, NA, FALSE), trendy = c(FALSE, NA, NA, NA, 
        NA, FALSE), upscale = c(FALSE, NA, NA, NA, NA, FALSE), 
        casual = c(FALSE, NA, NA, NA, NA, FALSE)), .Names = c("romantic", 
    "intimate", "classy", "hipster", "divey", "touristy", "trendy", 
    "upscale", "casual"), row.names = c(NA, 6L), class = "data.frame"), 
    Parking = structure(list(garage = c(FALSE, NA, NA, NA, FALSE, 
    FALSE), street = c(FALSE, NA, NA, NA, FALSE, FALSE), validated = c(FALSE, 
    NA, NA, NA, FALSE, FALSE), lot = c(FALSE, NA, NA, NA, FALSE, 
    FALSE), valet = c(FALSE, NA, NA, NA, FALSE, FALSE)), .Names = c("garage", 
    "street", "validated", "lot", "valet"), row.names = c(NA, 
    6L), class = "data.frame"), `Has TV` = c(FALSE, NA, NA, NA, 
    NA, TRUE), `Outdoor Seating` = c(FALSE, FALSE, NA, NA, NA, 
    FALSE), Attire = c("casual", NA, NA, NA, NA, "casual"), Alcohol = c("none", 
    NA, NA, NA, NA, "full_bar"), `Waiter Service` = c(FALSE, 
    NA, NA, NA, NA, TRUE), `Accepts Credit Cards` = c(TRUE, TRUE, 
    NA, NA, FALSE, TRUE), `Good for Kids` = c(TRUE, NA, NA, TRUE, 
    NA, TRUE), `Good For Groups` = c(TRUE, TRUE, NA, NA, NA, 
    TRUE), `Price Range` = c(1L, 1L, NA, NA, 2L, 1L), `Happy Hour` = c(NA, 
    TRUE, NA, NA, NA, FALSE), `Good For Dancing` = c(NA, NA, 
    NA, NA, NA, FALSE), `Coat Check` = c(NA, NA, NA, NA, NA, 
    FALSE), Smoking = c(NA, NA, NA, NA, NA, "no"), `Wi-Fi` = c(NA, 
    NA, NA, NA, NA, "no"), Music = structure(list(dj = c(NA, 
    NA, NA, NA, NA, FALSE), background_music = c(NA, NA, NA, 
    NA, NA, NA), jukebox = c(NA, NA, NA, NA, NA, NA), live = c(NA, 
    NA, NA, NA, NA, NA), video = c(NA, NA, NA, NA, NA, NA), karaoke = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("dj", "background_music", 
    "jukebox", "live", "video", "karaoke"), row.names = c(NA, 
    6L), class = "data.frame"), `Wheelchair Accessible` = c(NA, 
    NA, NA, NA, NA, NA), `Dogs Allowed` = c(NA, NA, NA, NA, NA, 
    NA), BYOB = c(NA, NA, NA, NA, NA, NA), Corkage = c(NA, NA, 
    NA, NA, NA, NA), `BYOB/Corkage` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Order at Counter` = c(NA, NA, NA, NA, NA, NA), `By Appointment Only` = c(NA, 
    NA, NA, NA, NA, NA), `Open 24 Hours` = c(NA, NA, NA, NA, 
    NA, NA), `Hair Types Specialized In` = structure(list(coloring = c(NA, 
    NA, NA, NA, NA, NA), africanamerican = c(NA, NA, NA, NA, 
    NA, NA), curly = c(NA, NA, NA, NA, NA, NA), perms = c(NA, 
    NA, NA, NA, NA, NA), kids = c(NA, NA, NA, NA, NA, NA), extensions = c(NA, 
    NA, NA, NA, NA, NA), asian = c(NA, NA, NA, NA, NA, NA), straightperms = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("coloring", "africanamerican", 
    "curly", "perms", "kids", "extensions", "asian", "straightperms"
    ), row.names = c(NA, 6L), class = "data.frame"), `Accepts Insurance` = c(NA, 
    NA, NA, NA, NA, NA), `Ages Allowed` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Dietary Restrictions` = structure(list(`dairy-free` = c(NA, 
    NA, NA, NA, NA, NA), `gluten-free` = c(NA, NA, NA, NA, NA, 
    NA), vegan = c(NA, NA, NA, NA, NA, NA), kosher = c(NA, NA, 
    NA, NA, NA, NA), halal = c(NA, NA, NA, NA, NA, NA), `soy-free` = c(NA, 
    NA, NA, NA, NA, NA), vegetarian = c(NA, NA, NA, NA, NA, NA
    )), .Names = c("dairy-free", "gluten-free", "vegan", "kosher", 
    "halal", "soy-free", "vegetarian"), row.names = c(NA, 6L), class = "data.frame")), .Names = c("Take-out", 
"Drive-Thru", "Good For", "Caters", "Noise Level", "Takes Reservations", 
"Delivery", "Ambience", "Parking", "Has TV", "Outdoor Seating", 
"Attire", "Alcohol", "Waiter Service", "Accepts Credit Cards", 
"Good for Kids", "Good For Groups", "Price Range", "Happy Hour", 
"Good For Dancing", "Coat Check", "Smoking", "Wi-Fi", "Music", 
"Wheelchair Accessible", "Dogs Allowed", "BYOB", "Corkage", "BYOB/Corkage", 
"Order at Counter", "By Appointment Only", "Open 24 Hours", "Hair Types Specialized In", 
"Accepts Insurance", "Ages Allowed", "Dietary Restrictions"), row.names = c(NA, 
6L), class = "data.frame"), type = c("business", "business", 
"business", "business", "business", "business")), .Names = c("business_id", 
"FullAddress", "HoursFridayClose", "open", "categories", "city", 
"review_count", "name", "neighborhoods", "longitude", "state", 
"stars", "latitude", "attributes", "type"), row.names = c(NA, 
6L), class = "data.frame")
> 

Here's all of it in, in all of its glory!

Los_Cairos
  • 11
  • 1
  • 3
  • 1
    Can you give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) with what you have and what you want? – alistaire Mar 23 '16 at 18:51
  • @Los_Cairos: please edit your question, don't paste the code in comments – digEmAll Mar 23 '16 at 19:00
  • I apologize. I am still new! – Los_Cairos Mar 23 '16 at 19:05
  • @Los_Cairos: no problem, please copy the output of this function: `dput(head(restaurant.data))` and paste in your question... this would be much more useful to us – digEmAll Mar 23 '16 at 19:06
  • @digEmAll, I pasted the output. It is quite messy. I'd lke to point out again that this is "streamed in" from a json dataset. I am not sure if that makes a difference. Thanks in advance! – Los_Cairos Mar 23 '16 at 19:15
  • Your data is quite convoluted (read "messy"...) since you have a `data.frame` where some columns are in turn a `data.frame` whose columns are again `data.frame`'s O_O ... now I gotta go, maybe later I'll have a look and try to help if none will have answered yet... – digEmAll Mar 23 '16 at 19:23
  • You have a nested data.frame. Try `tidyr::unnest` to make it a 2d-data.frame. – EDi Mar 23 '16 at 19:25
  • @EDi: I used a `tidyr::unnest` function on the original data set and got this error: `Error: data_frames can only contain 1d atomic vectors and lists` so I used a function within `jsonlite` called `flatten` that unwinds some of these dataframe-within-dataframe problems. I used `tidyr::unrest` on the new dataset and I got this error `Error: All nested columns must have the same number of elements.` Thanks for the help! – Los_Cairos Mar 23 '16 at 19:57
  • Please give use an example of your JSON file. – EDi Mar 23 '16 at 20:24
  • @EDi: that's what I have added in the question. the results from the `dput()` function. I don't have other, non-text ways of viewing the file. – Los_Cairos Mar 23 '16 at 21:37
  • That is a R data.frame not a JSON. – EDi Mar 23 '16 at 22:06

1 Answers1

0

Your data is quite convoluted (I would say messy...) since you have a data.frame where some columns are in turn a data.frame whose columns are again data.frame's... also some columns are just list with elements of different lenght inside (i.e. columns "neighborhoods" and "categories")

So, I would flatten where possible with this custom function:

poormansUnnest <- function(nestedDF){
  toBind <- list()
  for(col in names(nestedDF)){
    if(is.data.frame(nestedDF[[col]])){
      df <- poormansUnnest(nestedDF[[col]])
      names(df) <- paste0(col,'.',names(df))
      toBind[[length(toBind)+1]] <- df
    }else{
      toBind[[length(toBind)+1]] <- nestedDF[col]
    }
  }
  final <- do.call(cbind.data.frame,toBind)
  return(final)
}

res <- poormansUnnest(restaurant.data)

# store list columns in separate object (then you would do whatever you need with them...)
categories <- res$categories
neighborhoods<- res$neighborhoods

# remove the list columns from the data.frame
res$categories <- NULL
res$neighborhoods <- NULL

So now, you should be able to rename the columns of res with gsub

digEmAll
  • 56,430
  • 9
  • 115
  • 140