I have the following data frames in R:
Id Class
@a 64
@b 7
@c 98
And the second data frame:
SOURCE TARGET
@d @b
@c @a
This is describes the nodes and the edges in a social network. The users (all with @ in front) belong to a specific community and the number is listed in column class. To analyse the connections between the columns I want to merge this data frames and create a new data frame looking like this:
SOURCE TARGET SOURCE.Class TARGET.Class
@a @i 56 2
@f @k 90 49
When I try merge()
R stop responding and I need to terminate R. The data frames constitute 20000 (node file) and 30000 (edge file) rows.
Then I want to know how many records in a given source class have the same target class and percentage of connections between classes.
I will be so happy if someone could help me since I'm very new to R.
EDIT:
I think I manage to create the columns by this code using match()
instead of merge()
(rt_node contain the columns "id", "class" and rt_node contain the columns "source","target"):
#match source in rt_edges with id in rt_node
match(rt_edges$Source,rt_nodes$id)
#match target in rt_edges with id in rt_node
match(rt_edges$Target,rt_nodes$id)
#create source_class
rt_nodes$modularity_class[match(rt_edges$Source,rt_nodes$id)]
rt_edges$Source_Class=rt_nodes$modularity_class[match(rt_edges$Source,rt_nodes$id)]
#create target_class
rt_nodes$modularity_class[match(rt_edges$Target,rt_nodes$id)]
rt_edges$Target_Class=rt_nodes$modularity_class[match(rt_edges$Target,rt_nodes$id)]
Now I just need to figure out how I can find the percentage of connections in each class and the percentage of connections with other classes. Any tips on how to do that?