Load data from a .csv file and then save it in a dictionary in R

Question

I need to load data from a .csv file and then save it in a dictionary in R.

There are ten thousands of lines of data entry that need to be loaded from a .csv file.

The data format:

  country,region,value
     1  ,  north , 101
     1  ,  north , 219
     2  ,  south , 308
     2  ,  south , 862
   ... , ...     , ...

My expected results that can be save in a data structure of R :

    country , region, list of values
     1  north     101 , 219 
     2  south     308 , 862

So that I can get the values that are associated with the same country and region.

Each row may have different country and region.

I need to save the value with the same country and region together.

Any help would be appreciated.

Please [read this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You need to give better example and make your question clear — CHP, Mar 17 '14 at 01:11
I think [@Ista](http://stackoverflow.com/users/189946/ista)'s answer does what you want, but it should be noted there's no *dictionary* type in R. @Ista's use of a `data.frame` is prbly what you're after but if I read your question right, you'll need to do a `write.csv()` call post-creation of `dat` to save `dat` to a data file. — hrbrmstr, Mar 17 '14 at 01:48
@hrbrmstr , I need to load data from a file and then do some analysis on them. I have updated the OP. Thanks ! — user3356689, Mar 17 '14 at 02:15

score 0 · Answer 1 · answered Mar 17 '14 at 01:37

It's not clear exactly what you are willing to assume about the input data, nor exactly what the desired output is. Perhaps

tmp <- read.csv(text="country,region,value
     1  ,  north , 101
     1  ,  north , 219
     2  ,  south , 308
     2  ,  south , 862")

dups <- duplicated(tmp[1:2])
dat <- data.frame(tmp[!dups, 1:2], value = paste(tmp[!dups, 3], tmp[dups, 3], sep = " , "))
dat
##   country   region     value
## 1       1   north  101 , 219
## 3       2   south  308 , 862

score 0 · Answer 2 · answered Jul 19 '14 at 13:32

If I were you, I would stick with keeping your data in its "long" form. But if you really want to "aggregate" the data this way, you can look at the aggregate function:

Option 1: Values stored as a list in a column. Fun, but hell to deal with later on.

aggregate(value ~ country + region, tmp, I, simplify=FALSE)
#   country   region    value
# 1       1   north  101, 219
# 2       2   south  308, 862
str(.Last.value)
# 'data.frame':  2 obs. of  3 variables:
#  $ country: num  1 2
#  $ region : Factor w/ 2 levels "  north ","  south ": 1 2
#  $ value  :List of 2
#   ..$ 1:Class 'AsIs'  int [1:2] 101 219
#   ..$ 3:Class 'AsIs'  int [1:2] 308 862

Option 2: Values stored as a single comma separated character vector column. Less hell to deal with later on, but would likely require further processing (splitting up again) to be of much use.

aggregate(value ~ country + region, tmp, paste, collapse = ",")
#   country   region   value
# 1       1   north  101,219
# 2       2   south  308,862
str(.Last.value)
# 'data.frame': 2 obs. of  3 variables:
#  $ country: num  1 2
#  $ region : Factor w/ 2 levels "  north ","  south ": 1 2
#  $ value  : chr  "101,219" "308,862"

Load data from a .csv file and then save it in a dictionary in R

2 Answers2