This problem is unsolved by my brain, so I'm asking all of you for a little help.
This is part of my data:
rfam[1:20,]
id name
1 RF00001 LL_skoljka_r41782307_x1
2 RF00001 LL_skoljka_r9950955_x1
3 RF00001 LL_skoljka_r49323482_x1
4 RF00001 LL_skoljka_r14141437_x1
5 RF00001 LL_skoljka_r16457227_x3
6 RF00002 LL_skoljka_r40347558_x1
7 RF00002 LL_skoljka_r44415149_x1
8 RF00002 LL_skoljka_r13145032_x1
9 RF00002 LL_skoljka_r29248915_x42
10 RF00003 LL_skoljka_r15936986_x1
11 RF00003 LL_skoljka_r28953530_x1
12 RF00003 LL_skoljka_r32665758_x1
13 RF00003 LL_skoljka_r32835489_x1
14 RF00003 LL_skoljka_r32835498_x1
15 RF04051 LL_skoljka_r33254611_x1
16 RF04051 LL_skoljka_r29761867_x12
17 RF04051 LL_skoljka_r45123665_x2
18 RF04051 LL_skoljka_r34837827_x15
19 RF08595 LL_skoljka_r38900754_x1
20 RF08595 LL_skoljka_r22016530_x1
In first step I want to remove all the nonsense before x in variable name
so I use:
rfam$name<- as.data.frame(sapply(rfam$name, gsub, pattern='^.*?x', replacement=""))
Result:
rfam[1:20,]
id name
1 RF00001 1
2 RF00001 1
3 RF00001 1
4 RF00001 1
5 RF00001 3
6 RF00002 1
7 RF00002 1
8 RF00002 1
9 RF00002 42
10 RF00003 1
11 RF00003 1
12 RF00003 1
13 RF00003 1
14 RF00003 1
15 RF04051 1
16 RF04051 12
17 RF04051 2
18 RF04051 15
19 RF08595 1
20 RF08595 1
In second step I would like to sum up values that stay in variable name
for each id
.
Results should look like this:
view(rfam)
id name
1 RF00001 7
2 RF00002 45
3 RF00003 5
4 RF04051 30
5 RF08595 2
If I want to sum up values, variable should be numeric. Both of my variables are factors. So I transformed id
to character using rfam[,1]=as.character(rfam[,1])
and tried to convert name
to numeric by rfam[,2]=as.numeric(levels(rfam[,2])[rfam[,2]])
. Transformation of id
was successful, while name
returns "NA's".
I've also tried rfam[,2]=as.numeric(as.character(rfam[,2]))
, but the result was the same.
I've tried to export data to txt file and then in excel do the rest of analysis, but when I export data, it looks like this:
"id" "name"
"1" "RF00001" c(1, 1, 1, 1, 9, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 3, 7, 5, 1, 1, 1, 9, 1, 14, 10, 7, 1, 5, 1, 1, 1, 1, 1, 7, 1, 2, 1, 1, 1, 9, 1, 7, 1, 1, 1, 1, 1, 1, 10, 7, 1, 10, 7, 1, 1, 1, 1, 1, 7, 1, 10, 1, 1, 1, 1, 1, 1, 1, 7, 1,...)
"2" "RF00001" c(1, 1, 1, 1, 9, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 3, 7, 5, 1, 1, 1, 9, 1, 14, 10, 7, 1, 5, 1, 1, 1, 1, 1, 7, 1, 2, 1, 1, 1, 9, 1, 7, 1, 1, 1, 1, 1, 1, 10, 7, 1, 10, 7, 1, 1, 1, 1, 1, 7, 1, 10, 1, 1, 1, 1, 1, 1, 1, 7, 1,...)
"3" "RF00001" c(1, 1, 1, 1, 9, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 3, 7, 5, 1, 1, 1, 9, 1, 14, 10, 7, 1, 5, 1, 1, 1, 1, 1, 7, 1, 2, 1, 1, 1, 9, 1, 7, 1, 1, 1, 1, 1, 1, 10, 7, 1, 10, 7, 1, 1, 1, 1, 1, 7, 1, 10, 1, 1, 1, 1, 1, 1, 1, 7, 1,...)
Now here is my dead end. I don't understand what is happening and I would appreciate if you could help me out.