0

I'm learning R by using it on one project where I need to extract unique paths from logs.

Now, My workaround (lower) part of the code work, but I had to split the log into two files and perform grouping on them separately, while I tried the same on variables, I was getting all the data in all three path counts.

Can someone point me to what is wrong in the first approach, as I doubt that writing physically files to a disk is intended way?

a = read.csv('download-report-06-10-2017.csv')
yesterdays_data <- a[grepl("2017-10-05", a$Download.Time), ]
todays_data <- a[grepl("2017-10-06", a$Download.Time), ]

write.csv(yesterdays_data, "yesterdays.csv")
write.csv(todays_data, "todays.csv")

path_count <- as.data.frame(table(a$Path))
path_count_today <- as.data.frame(table(todays_data$Path))
path_count_yday <- as.data.frame(table(yesterdays_data$Path))
#### path_count, path_count_today & path_count_yday contain the same values and I expect them to be different ???

yd = read.csv('yesterdays.csv')
td = read.csv('todays.csv')

path_count_td <- as.data.frame(table(td$Path))
path_count_yd <- as.data.frame(table(yd$Path))

#### path_count_td and path_count_yd are different, as I'd expect in upper three variables
Balkyto
  • 1,460
  • 4
  • 22
  • 47
  • Can you add `str(a)` to your post? Also read about [reproducible example](http://stackoverflow.com/questions/5963269). – zx8754 Oct 06 '17 at 11:21
  • How can the answer to the question in your title be anything other than "of course"? Datasets can be subsetted in various ways and the result can be assigned to a variable. Perhaps you can pick a title for your question that better reflects your actual question. – John Coleman Oct 06 '17 at 12:03
  • @JohnColeman You are correct. But, since I'm new, just learning R (my second day of actual hands-on experience), I don't really know what causes the different result between the two approaches. So I kinda need an explanation why is it different. If you know what's the difference I'm more than OK to change the title/question. :) – Balkyto Oct 06 '17 at 12:25
  • @Balkyto I don't know -- maybe something like "How to extract unique paths from logs" -- though it would be better to try to make a reproducible example rather than worry too much about the title. It is always good to have a [mcve] on Stack Overflow, but in the R tag it is especially important since the R community places a heavy weight on reproducibility. – John Coleman Oct 06 '17 at 12:31
  • @zx8754 - adding str(a) causes an error. I've tried reading reproducible example and a few things from there, but that did not help (yet). – Balkyto Oct 06 '17 at 12:32
  • @JohnColeman - It's not about path. This is a theoretical question why grouping or subsetting data the way I did first does not work, vs it works when two separate files are loaded. I just don't have the proper vocabulary just yet, and R tag people would most likely know what I'm talking about. I guess. – Balkyto Oct 06 '17 at 12:34

0 Answers0