1

I have a dataframe:

location <- c("a", "b", "c", "d", "e", "e")
type <- c("city", "city", "town", "town", "village", "village")
code <- c("123", "112", "83749", "83465", "38484757", "3838891")
country <- c("zz", "zz", "zz", "zz", "zz", "zz")
df <- data.frame(location, type, code, country)

I want to group by location and convert to dictionary Something like below:

{location:[[type], [code], [country]]}

I know this should be quite straight forward using python, but I am not sure how to do that using R. I have tried below using unclass, but still didn't get what i am expecting:

unclass(by(df, df$location, function(x) {
  tmp <- x$code
  setNames(tmp, x$location[1])
  tmp
})) -> location_mapping

Expected Output:

{
'a':[['city'],['123'],['zz']],
'b':[['city'],['112'],['zz']],
'c':[['town'],['83749'],['zz']],
'd':[['town'],['83465'],['zz']],
'e':[['village'],['38484757','3838891'],['zz']]
}
blackfury
  • 675
  • 3
  • 11
  • 22
  • R doesn't have dictionaries although there are [some packages](https://stackoverflow.com/questions/7818970/is-there-a-dictionary-functionality-in-r) that provide similar functionality. what is it that you want to achieve with a dict-like structure? maybe there is a more "R" way to get what you want. – D.J Jan 13 '23 at 06:05
  • I need to pass each location along with other values in the row to a function. Sometimes, one location can have multiple codes, so need to pass as a list. – blackfury Jan 13 '23 at 06:12
  • 1
    Please provdie the expected output using your example data. – Darren Tsai Jan 13 '23 at 07:17
  • I have updated the question with expected output – blackfury Jan 13 '23 at 07:24

2 Answers2

1

You can summarise each group of location with unique() across multiple columns.

library(dplyr)

dict <- df %>% 
  group_by(country, type, location) %>% 
  summarise(code = list(code), .groups = "drop")

dict
# # A tibble: 5 × 4
#   country type    location code     
#   <chr>   <chr>   <chr>    <list>
# 1 zz      city    a        <chr [1]>
# 2 zz      city    b        <chr [1]>
# 3 zz      town    c        <chr [1]>
# 4 zz      town    d        <chr [1]>
# 5 zz      village e        <chr [2]>

After converting it to JSON, you can get the expected structure.

split(select(dict, -location), dict$location) %>%
  jsonlite::toJSON(dataframe = "values", pretty = TRUE, auto_unbox = TRUE)

# {
#   "a": [["zz", "city", "123"]],
#   "b": [["zz", "city", "112"]],
#   "c": [["zz", "town", "83749"]],
#   "d": [["zz", "town", "83465"]],
#   "e": [["zz", "village", ["38484757", "3838891"]]]
# }
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
0

#--- EDITED

From your updated question, something like this might be what you want. R doesn't do curly braces like python does. Still, for the purpose of feeding further functions, the code below does what you want:

library(dplyr)

location <- c("a", "b", "c", "d", "e", "e")
type <- c("city", "city", "town", "town", "village", "village")
code <- c("123", "112", "83749", "83465", "38484757", "3838891")
country <- c("zz", "zz", "zz", "zz", "zz", "zz")
df <- data.frame(location, type, code, country)

df %>% 
  dplyr::group_by(location) %>% 
  summarise(code=list(code), across()) %>% # makes list of multiple `code` entries / `across()` keeps cols
  filter(!duplicated(location)) %>% # filtering duplicate locations
  .[,c(1,3,2,4] # arranging cols

# A tibble: 5 × 4
# Groups:   location [5]
  location type    code      country
  <chr>    <chr>   <list>    <chr>  
1 a        city    <chr [1]> zz     
2 b        city    <chr [1]> zz     
3 c        town    <chr [1]> zz     
4 d        town    <chr [1]> zz     
5 e        village <chr [2]> zz    
D.J
  • 1,180
  • 1
  • 8
  • 17
  • There is an error if the location is of multiple strings, "Error in `mutate()`: ! Problem while computing `code3 = lapply(strsplit(code, split = ""), function(x) as.list(x))`. ℹ The error occurred in group 1: location = "Abc - Def". Caused by error in `strsplit()`: ! non-character argument Run `rlang::last_error()` to see where the error occurred." – blackfury Jan 13 '23 at 07:02
  • the error message points to a problem with `strsplit(code)` as a non-character argment is passed to it. i assume you are passing numbers. to avoid this, you can wrap `code` in `as.character()` - i'll update the answer. if this is not solving it, please post the row in your question – D.J Jan 13 '23 at 07:09
  • I have updated the question with expected output. – blackfury Jan 13 '23 at 07:24