0

I have to analyze data from an economic experiment. My database is composed of 14 976 observations with 212 variables. Within this database we have other informations like the profit, total profit, the treatments and other variables. You can see that I have two types :

  • Type 1 is for sellers

  • Type 2 is for buyers

For some variables, results were put in the buyers (type 2) rows and not in the sellers ones (which is a choice completely arbitrary choice). However I would like to analyze gender of sellers who overcharged (for instance). So I need to manipulate my database and I don't know how to do this.

Here, you have part of the database :

ID       Gender   Period   Matching group   Group    Type  Overcharging ...
654        1           1            73         1        1      NA
654        1           2            73         1        1      NA
654        1           3            73         1        1      NA
654        1           4            73         1        1      NA 
435        1           1            73         2        1      NA
435        1           2            73         2        1      NA
435        1           3            73         2        1      NA
435        1           4            73         2        1      NA 
708        0           1            73         1        2       1
708        0           2            73         1        2       0
708        0           3            73         1        2       0
708        0           4            73         1        2       1   
546        1           1            73         2        2       0
546        1           2            73         2        2       0
546        1           3            73         2        2       1
546        1           4            73         2        2       0

To do what I'd like to I have many informations (only one seller was matched with one buyer in at the period x, in the group x, matching group x, and with treatment x...). To give you and example, in matching group 73 we know that at period 1 subject 708 was overcharged (the one in group 1). As I know that this men belongs to group 1 and matching group 73, I am able to identify the seller who has overcharged him at period 1 : subject 654 with gender =1.

So, I would like to put overcharging (and some others) buyers values on the sellers rows (type ==1) to analyze sellers behavior but at the right period, for the right group and the right matching group.

pogibas
  • 27,303
  • 19
  • 84
  • 117
Marc
  • 71
  • 7

1 Answers1

0

I have a long way of doing it with data.frames. If you are looking to code in R long term I would suggest checking out either (i) dplyr package, part of the tidyverse suite or (ii) data.table package. The first one has the most popular syntax, and is tied together nicely with a bunch of useful packages. The second is harder to learn but quicker. For your size data, this is negligible though.

In base data.frames, here is something I hope matches your request. Let me know if I've mistaken anything, or been unclear.

# sellers data eg
dt1 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 1, 
                 Overcharging = NA)
# buyers data eg
dt2 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 2, 
                 Overcharging = c(1,0,0,1))
# make my current data view
dt <- rbind(dt1, dt2)
dt[]

# split in to two data frames, on the Type column:
dt_split <- split(dt, dt$Type)
dt_split

# move out of list
dt_suffix <- seq_along(dt_split)
dt_names <- sprintf("dt%s", dt_suffix)
for(name in dt_names){
  assign(name, dt_split[match(name, dt_names)][[1]])
}
dt1[]
dt2[]

# define the columns in which to match up the buyer to seller
merge_cols <- c("Period", "MatchGroup", "Group")
# define the columns you want to merge, that you know are NA
na_cols <- c("Overcharging")
# now use merge operation, and filter dt2, to pull in only columns you want
# I suggest dropping the na_cols first in dt1, as otherwise it will create two 
# columns post-merge: Overcharging, i.Overcharging
dt1 <- dt1[,setdiff(names(dt1), na_cols)]
dt1_new <- merge(dt1, 
                 dt2[, c(merge_cols, na_cols)], # filter dt2 
                 by = merge_cols, # columns to match on
                 all.x = TRUE) # dt1 is x, dt2 is y. Want to keep all of dt1

# if you want to bind them back together, ensure the column order matches, and
# bind e.g.
dt1_new <- dt1_new[, names(dt2)]
dt_final <- rbind(dt1_new, dt2)
dt_final[]

What my line of thinking is to make these buyers and sellers data frames in to two separate ones. Then identify how they join, and migrate the data you need from buyers to sellers. Then finally bring them back together if so desired.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Jonny Phelps
  • 2,687
  • 1
  • 11
  • 20
  • Thanks. I think you understood well the point of my question. However I tried your solution and I have a dt1_new database with 16704 observations (which is more than my original database). How is it possible ? Another question; you talked about deplore package, do you have a solution with this ? – Marc Sep 17 '17 at 12:12
  • Usually when a data frame increases in size it will be because of the _merge_ operation. If there are duplicates in _dt2_, for the conditions in the _by_ clause, it will create rows that didn't exist before. Try running the code above again, this time with dt2 created with this code: _# buyers data eg dt2 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 2, Overcharging = c(1,0,0,1)) # add duplicate dt2 <- rbind(dt2, data.frame(Period = 1, MatchGroup = 73, Group = 1, Type = 2, Overcharging = 1))_ – Jonny Phelps Sep 17 '17 at 12:49
  • If you do have duplicates, you can find them using the _duplicated_ function. Stackoverflow will have lots on duplication in data already e.g. https://stackoverflow.com/questions/13742446/duplicates-in-multiple-columns I use the latter package (data.table), so I'm not the best person to ask for help with dplyr. I think the code I sent will work ok with data.table too. I'd recommend looking at tutorials of both first, and pick the one you feel most comfortable with. – Jonny Phelps Sep 17 '17 at 12:54
  • Thank for your time! – Marc Sep 17 '17 at 13:06