Extract values from data frame, according to column names derived from a second data frame

Question

I am working with two different data frames. In the first one, I have as rows names of geographic centroids (point), and as column names IDs of an other set of points (train stations). The values represent the distance between the centroids and the points in the column (extracted by gDistance).

From this one, I extracted the 3 closest points saving their IDs in a translated data frame. This one has now as row the centroids IDs and as column just the rank (1,2,3).

distances<- ft2miles(gDistance(stations, centroids, byid=TRUE))
access <-as.data.frame(apply(distances, 1, function(X) colnames(distances)[order(X)][1:3]))
access <- as.data.frame(t(access))

I would like to add to the last data frame three columns with the corresponding distance values. To be more clear, the columns of the distances data frame, correspond to the values of the second one, row names are identical.

First data frame:

                   158      1559     1560     1561      1722      1723     1732 166516384      1546
croatia_new.0 1.108585 0.9434521 1.138825 1.621118 0.8457217 0.5554764 2.441048 0.7393321 6.4736425
croatia_new.1 1.107511 1.3381573 2.970976 3.448820 2.6142722 2.0175318 4.167618 1.6984548 6.2927608

Second data frame:

              closest1  closest2  closest3
croatia_new.0     1723 166516384      1722
croatia_new.1      158      1559 166516384
croatia_new.2     1546      1584  98349400
croatia_new.3     1584      1546  98349400

The values of this data frame are essentially the column names of the first data frame, indicating the ID of the closest stations. What I want to obtain:

                closest1    closest2    closest3
croatia_new.0   0.5554764   0.739332    0.8457217
croatia_new.1   1.107511    1.3381573   1.6984548
croatia_new.N    ....         ....       ....

Output obtained using DigEmAll solution

                     V1       V2       V3
croatia_new.0  3.719507       NA       NA
croatia_new.1 63.082290 38.04791 36.94511
croatia_new.2        NA       NA 49.05636
croatia_new.3 55.836949 47.74032 46.48799
croatia_new.4 41.484741       NA 52.85941
croatia_new.5 60.654512       NA 49.68948

console output:

> dput(distances["croatia_new.2",1:10])
structure(list(`158` = 6.99119002871976, `1559` = 7.05366412218041, 
`1560` = 6.99165462844427, `1561` = 6.79651949339854, `1722` =     7.09268368796389, 
    `1723` = 7.06552376797322, `1732` = 6.3722523477356, `166516384` = 7.1279696005328, 
`1546` = 0.153911810037831, `1584` = 0.433837036180278), .Names = c("158", 
"1559", "1560", "1561", "1722", "1723", "1732", "166516384", 
"1546", "1584"), row.names = "croatia_new.2", class = "data.frame")

dput(access["croatia_new.2",])

structure(list(V1 = structure(149L, .Names = "croatia_new.2", .Label =    c("1001", 
"1002", "1003", "1004", "1005", "1006", "1007", "1018", "1019", 
"1020", "1023", "1024", "1025", "1031", "1038", "1042", "1045", 
"1046", "1047", "1048", "1050", "1052", "1053", "1054", "1056", 
"105753770", "1059", "1061", "1062", "1064", "1065", "1067", 
"1073", "1076", "1085", "1088", "109", "1091", "1093", "1095", ....."999"),   class = "factor")), .Names = c("V1", 
"V2", "V3"), row.names = "croatia_new.2", class = "data.frame")

Please provide a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — digEmAll, Jan 04 '17 at 17:43
So you want to merge the 2 data.frame's, am I right ? Try `merge(DF1,DF2,by="StationNameColumn")` — digEmAll, Jan 04 '17 at 22:29
A merge could be useful as well, what I need at first is a version of dataframe1 - `distances` - with only 3 values per row. The value that refers to the IDcolumn saved in the dataframe2 - `access`. So, in your code, with `StationNameColumn`, you mean the columns of the station dataframe (a third dataframe)? Your code gives me this error: `Error in fix.by(by.y, y) : 'by' must specify uniquely valid columns` — Gabriele Filomena, Jan 05 '17 at 00:34
BTW, merge can be used only when the two data.frames share some common columns that you want to use as key to perform the merge itself. You cannot use row.names for that, that's the reason I wrote by="StationNameColumn". I meant you needed to add a column in both data.frame with the StationName taken from their row.names — digEmAll, Jan 05 '17 at 11:01

digEmAll · Accepted Answer · 2017-01-05T22:47:48.087

2

I think you want something like this :

### RECREATE THE INPUT EXAMPLE
DF1 <- read.csv(text=
"158,1559,1560,1561,1722,1723,1732,166516384,1546
croatia_new.0,1.108585,0.9434521,1.138825,1.621118,0.8457217,0.5554764,2.441048,0.7393321,6.4736425
croatia_new.1,1.107511,1.3381573,2.970976,3.448820,2.6142722,2.0175318,4.167618,1.6984548,6.2927608",
check.names=FALSE
)
DF2 <- read.csv(text=
"closest1,closest2,closest3
croatia_new.0,1723,166516384,1722
croatia_new.1,158,1559,166516384
croatia_new.2,1546,1584,98349400
croatia_new.3,1584,1546,98349400"
)

# ensure the values of DF2 are characters
DF2[] <- lapply(DF2, as.character)

# for each row of DF2, we search for the value in DF1 using the
# row names and column names stored in DF2.
# The result will be a list of numeric values.
rows2bind <- 
  lapply(1:nrow(DF2),function(i){
    station <- row.names(DF2)[i]
    row <- DF2[i,]
    row2 <- as.numeric(DF1[match(station,row.names(DF1)),])
    return(row2[match(as.character(row),colnames(DF1))])
  })

# we turn the list into a data.frame similar to DF2 but with values instead of column names
DF2ext <- setNames(as.data.frame(do.call(rbind,rows2bind),row.names=row.names(DF2)), colnames(DF2))

# we add the closest columns to DF1 
DF1ext <- cbind(DF1,DF2ext[match(row.names(DF1),row.names(DF2ext)),])

Results :
(note the last 3 columns of DF1ext)

> DF2ext
               closest1  closest2  closest3
croatia_new.0 0.5554764 0.7393321 0.8457217
croatia_new.1 1.1075110 1.3381573 1.6984548
croatia_new.2        NA        NA        NA
croatia_new.3        NA        NA        NA

> DF1ext
                   158      1559     1560     1561      1722      1723     1732 166516384     1546  closest1  closest2  closest3
croatia_new.0 1.108585 0.9434521 1.138825 1.621118 0.8457217 0.5554764 2.441048 0.7393321 6.473643 0.5554764 0.7393321 0.8457217
croatia_new.1 1.107511 1.3381573 2.970976 3.448820 2.6142722 2.0175318 4.167618 1.6984548 6.292761 1.1075110 1.3381573 1.6984548

edited Jan 05 '17 at 22:47

answered Jan 05 '17 at 10:58

digEmAll

56,430
9
115
140

So, just copying your code in a new script, it works perfectly, as you have shown. When I try to apply it to my script something goes wrong, since the result is weird. in the data-frame DF1ext some values are missing (NA), and when not, they are not as expected. Could be due to the number of rows in the two data set? the first one is: 862 rows 978 columns, the second one: 862 row, 3 columns. – Gabriele Filomena Jan 05 '17 at 18:15
NAs can be generated when a row name in DF1 is not present in DF2, or when a cell value in DF2 does not correspond to a DF1 column name... – digEmAll Jan 05 '17 at 18:21
@GabrieleFilomena: I think there might be a bug due to the fact that in R data.frame subsetting by row names works with partial matching (e.g. `DF["croatia.1",]` is equal to `DF["croatia.11",]` if only `croatia.11` exists). I changed the code to fix the problem, please try again ;) – digEmAll Jan 05 '17 at 18:43
I checked. So, row names are equivalents in DF1 and DF2. And all cell values in DF2 are columns in DF1. Names of rows are like croatia_new.1 to croatia_new.160. Than just numbers from 161 to 560, serbia_new.1 to 140 and similar for bosnia. Your last solution leads to the same output of the first one, I'll post it in my question. Could be the fact that colnames of DF2 are numbers? – Gabriele Filomena Jan 05 '17 at 21:20
I really don't understand where could be the problem... please post the console output of the following functions `dput(DF1["croatia_new.2",1:10])` and `dput(DF2["croatia_new.2",])`, in your question – digEmAll Jan 05 '17 at 21:48
The reason is that DF2 contains factors and not characters... convert it to characters before calling my code using : `DF2[] <- lapply(DF2, as.character)`. Or better, when you create the data.frame use the parameter stringsAsFactors = FALSE – digEmAll Jan 05 '17 at 22:45
@GabrieleFilomena: I've updated the code adding `DF2[] <- lapply(DF2, as.character)`... have you tried if this fixes the problems ? – digEmAll Jan 06 '17 at 09:09

Extract values from data frame, according to column names derived from a second data frame

1 Answers1