I have a data frame dimension of 170000 3. and I want to visualize which CloumnA data target ColumnB data. The data frame I have looked like:
The output expected:
A1 - <A,B
B1 -<B,E
I have a data frame dimension of 170000 3. and I want to visualize which CloumnA data target ColumnB data. The data frame I have looked like:
The output expected:
A1 - <A,B
B1 -<B,E
If it is network data, you can use igraph
to plot and calculate the network statistics.
The output of the network data will be the same you want.
Like A1 <- A,B B1 <- B,E
df <-
data.frame(
ColA = c("A1", "A1", "A1", "A4", 'B1', "B1", "B2", "B3", "C170000"),
ColB = c("A", "B", "C", "D", "B", "E", "A", "D", "H")
)
library(igraph)
g <- graph_from_edgelist(as.matrix(df))
layout <- layout.fruchterman.reingold(g)
>g # Output the data structure
IGRAPH bc711d6 DN-- 12 9 --
+ attr: name (v/c)
+ edges from bc711d6 (vertex names):
[1] A1 ->A A1 ->B A1 ->C A4 ->D B1 ->B
[6] B1 ->E B2 ->A B3 ->D C170000->H
## Here is the plot
plot(g,layout=layout)
Please dput
a sample of you data instead of post an image, which is not reproducible. However it is a bit unclear what you want to do. This is my guess with simulated data:
set.seed(4)
c1 <- rep(LETTERS[1:3],each=100)
c1 <- paste0(c1,1:5)
c2 <- sample(LETTERS[1:4],300,replace = T)
df <- data.frame(c1,c2)
df <- df[order(df$c1),]
unique(df$c1)
l <- lapply(unique(df$c1), function(x)paste(unique(df[df$c1==x,"c2"]),collapse = ","))
result <- data.frame(c1=unique(df$c1),unlist(l))
EDIT
Here also I try to guess what you want, and since I don't know how to represent this kind of data, I try to do the most logical plots (at least to me).
If you want a taste of what combination is more frequent you could use a simple scatterplot with ggplot2
and geom_text
, or wordcloud
plot, like those from wordcloud
package:
library(ggplot)
library(wordcloud)
library(wordcloud2)
freq$Freq <- as.factor(freq$Freq)
result <- merge.data.frame(result,freq,by.x="unlist.l.",by.y = "Var1")
ggplot(result,aes(x=Freq,y=c1))+
geom_text(label=result$unlist.l.,aes(col=Freq))
wordcloud(
words = result$unlist.l.,
min.freq = 1,
max.words = 100,
random.order = F,
rot.per = 0.35,
colors = brewer.pal(8, "Paired")
)
which give you something like this:
ggplot
wordcloud