0

I'm new to R and try to solve the following question:

I have a dataframe, which was read out of a csv-file:

data<-read.table(file='data.csv',header=T,sep=';',strip.white=T)

The csv looks like the following 2 lines, but is much longer (1,7 million lines):

sequence_nbr;tmstmp;source_addr;add_data
1;2016-07-10 10:09:20;3.6.25;data1
2;2016-07-10 10:09:20;3.6.28;data2

There are 55 different source_addr in the file and I'm trying to find the first occurence of each address.

But as I'm new to this I have no idea how to create a table with tmstmp and the source_addr.

Would love to understand a possible way to do this.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Johannes
  • 113
  • 1
  • 9
  • 1
    If you need first occurrence for each `source_addr` , you could do `library(dplyr); df %>% group_by(source_addr) %>% slice(which.min(tmstmp))` provided `tmstmp` is POSIXct class. – Ronak Shah Oct 11 '18 at 07:06
  • Wonderful, that works! Thanks a lot. I was first confused by the `%>%` but a short research showed the pipe-like-function. Can you post this as an answer, so that I can accept it? – Johannes Oct 11 '18 at 07:17
  • 1
    I am glad that it worked. Actually this question has been asked before and I have marked it as duplicate. Look at the linked post and you will find multiple ways to do the same thing. You can chose whichever approach suits you the best. – Ronak Shah Oct 11 '18 at 07:19

0 Answers0