0

Say that I have a dataframe

xy.df <- data.frame(x = runif(10), y = runif(10))

What I want to do is:

  1. Create a list of non-redundant items in column 1
  2. For each item in this list (items in column 1), identify the list of corresponding items in column 2

I have tried some tests with dplyr but I still don't get it!

df = xy.df %>% group_by(xy.df$x)

Any help would be appreciated.

C. DAVID
  • 61
  • 1
  • 7
  • Do you really need list for that? And what is your wanted output? – pogibas Feb 08 '18 at 22:12
  • yes because in practice the first column in my case represents IP adresses, the second column represents the used ports on each IP adress, I want to print in a file in each line the IP address followed by the list of ports opened in this IP. Thanks – C. DAVID Feb 08 '18 at 22:17
  • Please provide some accessible sample data and expected output as part of a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)! – Calum You Feb 08 '18 at 22:26
  • try `group_by(x)` rather than `group_by(xy.df$x)`; or maybe what you need is `distinct(xy.df, x, .keep_all = TRUE)` – seasmith Feb 08 '18 at 22:34

2 Answers2

0

Sorry I wanted to simplify my problem with the precedent examples, so here a small example of the dataframe

idProcess | ip | port|

5aa78 | 128.55.12.81 | 9265

5aa78 | 128.55.12.81 | 59264

9a978 | 130.50.12.99 | 63925

.....

So what I want to have is a list of lists, where each entry in the global list if the process name, for each process get the list of non redundant IP and non redundant port in one list, i.e.

List["5aa78"]=(128.55.12.81, 9265 , 59264)

List["9a978"]=( 130.50.12.99 , 63925) ....

thanks

C. DAVID
  • 61
  • 1
  • 7
0

Try this:

Your data.frame:

db<-data.frame(idProcess=c("5aa78","5aa78","9a978"),
                ip=c("128.55.12.81","128.55.12.81","130.50.12.99"),
                port=c(9265,59264,63925))

Building your output (is not the most efficient way but it'is clear what I'm doing)

list<-NULL
id_unique<-as.character(unique(db$idProcess)) 
for (i in 1:length(id_unique))
{
   ip_i<-unique(as.character(db[as.character(db$idProcess)==id_unique[[i]],"ip"]))
   list[eval(id_unique[[i]])]<-list(c(ip_i,unique(as.character(db[as.character(db$idProcess)==id_unique[[i]],"port"]))))
}

Your output

list
$`5aa78`
[1] "128.55.12.81" "9265"         "59264"       

$`9a978`
[1] "130.50.12.99" "63925" 
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39