0

I have a data frame dimension of 170000 3. and I want to visualize which CloumnA data target ColumnB data. The data frame I have looked like:

data image

The output expected:

A1 - <A,B

B1 -<B,E

StupidWolf
  • 45,075
  • 17
  • 40
  • 72

2 Answers2

2

If it is network data, you can use igraph to plot and calculate the network statistics.

The output of the network data will be the same you want.

Like A1 <- A,B B1 <- B,E

df <-
  data.frame(
    ColA = c("A1", "A1", "A1", "A4", 'B1', "B1", "B2", "B3", "C170000"),
    ColB = c("A", "B", "C", "D", "B", "E", "A", "D", "H")
  )

library(igraph)

g <- graph_from_edgelist(as.matrix(df))
layout <- layout.fruchterman.reingold(g)

>g # Output the data structure

IGRAPH bc711d6 DN-- 12 9 -- 
+ attr: name (v/c)
+ edges from bc711d6 (vertex names):
[1] A1     ->A A1     ->B A1     ->C A4     ->D B1     ->B
[6] B1     ->E B2     ->A B3     ->D C170000->H


## Here is the plot 

plot(g,layout=layout)

enter image description here

mustafaakben
  • 501
  • 2
  • 5
  • and what if I add one column of correlation and then want to do the same thing but also corresponding with Correlation data, cor = c(0.04195115, 0.04025928, 0.03939044, 0.03935268 ,0.03842027 ,0.03765408) – user15708301 Apr 29 '21 at 12:47
  • @mustafaakben, your graphical solution is way better than mine. Thanks for complete the answer with your knowledge – Elia Apr 29 '21 at 12:51
  • @Elia Thank you :) I like your word clouds and other graphs too. – mustafaakben Apr 29 '21 at 13:03
  • @user15708301 You can add the correlation as a weight between edges. So that correlations will be reflected as the strength of the links. If you want, you can make a correlation network graph too. However, I assume that you are using genetic data. It would be better to check this [link](https://cran.r-project.org/web/packages/graph4lg/vignettes/genetic_graph_2.html) They demonstrate how you can construct a graph and analyze genetic data. "Genetic graph construction and analysis with graph4lg" – mustafaakben Apr 29 '21 at 13:06
0

Please dput a sample of you data instead of post an image, which is not reproducible. However it is a bit unclear what you want to do. This is my guess with simulated data:

set.seed(4)
c1 <- rep(LETTERS[1:3],each=100)
c1 <- paste0(c1,1:5)
c2 <- sample(LETTERS[1:4],300,replace = T)
df <- data.frame(c1,c2)
df <- df[order(df$c1),]

unique(df$c1)
l <- lapply(unique(df$c1), function(x)paste(unique(df[df$c1==x,"c2"]),collapse = ","))
result <- data.frame(c1=unique(df$c1),unlist(l))

EDIT

Here also I try to guess what you want, and since I don't know how to represent this kind of data, I try to do the most logical plots (at least to me). If you want a taste of what combination is more frequent you could use a simple scatterplot with ggplot2 and geom_text, or wordcloud plot, like those from wordcloud package:

   library(ggplot)
   library(wordcloud)
   library(wordcloud2)


freq$Freq <- as.factor(freq$Freq)
result <- merge.data.frame(result,freq,by.x="unlist.l.",by.y = "Var1")

ggplot(result,aes(x=Freq,y=c1))+
  geom_text(label=result$unlist.l.,aes(col=Freq))
    
    wordcloud(
      words = result$unlist.l.,
      min.freq = 1,
      max.words = 100,
      random.order = F,
      rot.per = 0.35,
      colors = brewer.pal(8, "Paired")
    )

which give you something like this:

ggplot

enter image description here

wordcloud

enter image description here

Elia
  • 2,210
  • 1
  • 6
  • 18
  • yes the exact same way and if possible could you please suggest how can I also visualize that output in form of the network or any other graphical way? – user15708301 Apr 29 '21 at 11:44