Merging two datasets in R with slightly altered variables for column names

Asked Jul 03 '15 at 06:20

Active Mar 11 '16 at 22:31

Viewed 196 times

I am extremely new to R and am trying to merge two data.fields in R that have slightly different column names.

For example: - The column names for data.field 1 are "Sample.1.a" "Sample.2.a" "Sample.3.a" "Sample.4.a"... etc for 300 samples - The column names for data.field 2 are "Sample.1" "Sample.2" "Sample.4" "Sample 5"... etc for 305 samples.

I need to find a way to find the intersection between these two data.fields and need to remove samples that do not appear in both data.fields. Any advice? As far as I can tell, the merge() function will not work for this.

I apologize if this first post I have made is improperly formatted. I just need help.

edited Mar 11 '16 at 22:31

Jaap

81,064
34
182
193

asked Jul 03 '15 at 06:20

BTE0715

3

Please provide a reproducible example. You can use `sub` to change the column names in the first and use `intersect` with the column names of both datasets. – akrun Jul 03 '15 at 06:26
If you are seeking to subset columns that are common in both dataset `names(dat1) <- sub('\\.[^0-9]+$', '', names(dat1));ind <- intersect(names(dat1), names(dat2)); dat1[ind]; dat2[ind]` For posting guidelines, check [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – akrun Jul 03 '15 at 06:30
@akrun how do you post so fast ! – Frash Jul 03 '15 at 06:37
1

@Frash It must be related to the experience in using R. – akrun Jul 03 '15 at 07:55
Thank you. I figured it out. I used strtrim to shorten the names so they matched in the two datasets and was able to merge with rbind from there. I'll make sure to provide clear examples in future posts! – BTE0715 Jul 04 '15 at 20:01

Merging two datasets in R with slightly altered variables for column names

0 Answers0