I asked a question which seems to be very difficult to solve. I have been busy with it for few days already. I try to break down the questions to smaller questions so that I might get some help. The original question is here extract information from a data frame
I have a data frame like below
df<- structure(list(s1 = structure(1:3, .Label = c("3-4", "4-1", "5-4"
), class = "factor"), s2 = structure(1:3, .Label = c("2-4", "3-15",
"7-16"), class = "factor")), .Names = c("s1", "s2"), row.names = c(NA,
-3L), class = "data.frame")
Looks like below
In this example I have two columns but the solution should not specific to only two columns
> df
# s1 s2
#1 3-4 2-4
#2 4-1 3-15
#3 5-4 7-16
I want to count how many times a string after - is repeated and how many times they appear in each column
Lets say if I look at the first column, I see 4, 1, 4 and second column I see 4, 15 and 16 , so I will have 3 times 4 is repeated , once 1 and once 15 and once 16
M repeated 4 3 1 1 15 1 16 1
If I look at which columns they come from these strings 4 comes two times from column 1 and once from column 2 1 comes once from first column (s1) 15 comes once from column (s2) comes once from second column (s2)
so I will have the output like this
M repeated COL1 COL2
4 3 2 1
1 1 1 -
15 1 - 1
16 1 - 1
what I was thinking ? thanks to @Arkun, I can melt the df
M1 <- melt(df, id.var=NULL)
The output will be like this
>M1
# variable value
# 1 s1 3-4
# 2 s1 4-1
# 3 s1 5-4
# 4 s2 2-4
# 5 s2 3-15
# 6 s2 7-16
Then I split the values based on the hyphen using below
lst <- setNames(strsplit(M1$value, "-"), M1$variable)
now I have the following
>lst
#$s1
#[1] "3" "4"
#$s1
#[1] "4" "1"
#$s1
#[1] "5" "4"
#$s2
#[1] "2" "4"
#$s2
#[1] "3" "15"
#$s2
#[1] "7" "16"
Then I don't know how to get further