0

I have two CSV's, imported as dataframe, given below:

dd1 <- read.csv("C:\\Users\\sharmb5\\Desktop\\Truth Dataset\\Buckets.csv",stringsAsFactors = FALSE, header = TRUE, sep = ",", blank.lines.skip = TRUE)

dd <- read.csv("C:\\Users\\sharmb5\\Desktop\\test dataset\\Buckets.csv", header = TRUE, stringsAsFactors = FALSE, sep = ",", blank.lines.skip = TRUE)

dd1 (with 6 rows and 6 columns) is a subset of dd (with 6 rows and 6 columns) dataframe (entries may be shuffled). I want to do that subset matching in between dd and dd1.

Attempts

  1. library(plyr) merge(dd, dd1)

  2. intersect(dd,dd1)

  3. inner_join(dd,dd1)

  4. fintersect(dd,dd1)
  5. merge.data.frame(dd,dd1)

I tried above all function along with respective packages, but give the same output

[1] Bucket Account X..Objects Chargeable.Capacity Region Bucket.Creation.Date
<0 rows> (or 0-length row.names)

and if read the CSV's with fread method, a response is the same.

dd <- fread("C:\\Users\\sharmb5\\Desktop\\test dataset\\Buckets.csv",stringsAsFactors = FALSE,blank.lines.skip = TRUE)

dd1 <- fread("C:\\Users\\sharmb5\\Desktop\\Truth Dataset\\subset.csv",stringsAsFactors = FALSE,blank.lines.skip = TRUE).

I can't understand where is the problem, because the same functions work well when I gave data manually(i.e. creating some random data frame).

Updated

dput(head(dd)) is:

`structure(list(Bucket = c("ireland-bucket", "singapore-test-bucket", 
"virginia", "sydney-testbucket", "test-testing-purpose", 
"aws123"), Account = c("Kishore", "Kishore", "Kishore", "Kishore", 
"Kishore", "Kishore"), X..Objects = c(2L, 2L, 4L, 2L, 2L, 57L
), Chargeable.Capacity = c(5.22e-05, 5.02e-06, 0.000104157, 5.23e-05, 
4.54e-05, 0.008055141), Region = c("eu-1", "ap-1", 
"us-2", "ap-1", "us-2", "us-1"), Bucket.Creation.Date = c("2017-08-28T08:12:21.000Z", 
"2017-08-28T08:15:22.000Z", "2017-08-29T05:09:14.000Z", "2017-08-29T05:14:03.000Z", 
"2019-03-12T13:57:23.000Z", "2017-08-22T10:13:36.000Z")), row.names = c(NA, 
6L), class = "data.frame")`

and dput(head(dd1)) is

`structure(list(Bucket = c("ireland-bucket", "singapore-test-bucket", 
"virginia", "sydney-testbucket", "test-testing-purpose", 
"aws123"), Account = c("Kishore", "Kishore", "Kishore", "Kishore", 
"Kishore", "Kishore"), X..Objects = c(2L, 3L, 3L, 3L, 2L, 57L
), Chargeable.Capacity = c(5.22444024682045e-05, 5.01330941915512e-06, 
0.000104157254099846, 5.22788614034653e-05, 4.53982502222061e-05, 
0.00805514119565487), Region = c("eu-west-1", "ap-southeast-1", 
"us-east-1", "ap-southeast-2", "us-east-2", "us-east-1"), Bucket.Creation.Date = c("2017-08-28T08:12:21.000Z", 
"2017-08-28T08:14:22.000Z", "2017-08-29T05:09:14.000Z", "2017-08-29T05:14:01.000Z", 
"2019-03-12T13:57:23.000Z", "2017-08-22T10:13:36.000Z")), row.names = c(NA, 
6L), class = "data.frame")`

Help me to rectify this issue. Thank you for suggestions.

Bhavneet sharma
  • 337
  • 5
  • 16
  • May be related (leading white space?): https://stackoverflow.com/questions/50435712/two-dataframes-after-merging-in-r-are-showing-0-rows-or-0-length-row-names – cgrafe Jun 10 '19 at 17:53
  • @cgrafe I tried the same that you referred, but the output is same (didn't work for me) and still not able to figure out the problem. – Bhavneet sharma Jun 11 '19 at 06:12
  • I want to do one more thing with this, completely match of `dd1` in `dd` data frame (either full data frame `dd` or subset of `dd`). It should return true if entire `dd1` found in `dd` (either subset or full) and false, if doesn't match. – Bhavneet sharma Jun 11 '19 at 09:54
  • 1
    Do you need `merge(dd, dd1, all = TRUE)` ? – Ronak Shah Jun 11 '19 at 11:09
  • @RonakShah yes it works, thanks. Please refer this [Question](https://stackoverflow.com/q/56541863/10738353) , the actual requirement is given in that question. – Bhavneet sharma Jun 12 '19 at 06:48

0 Answers0