I am new in this forum, sorry for any issues... I have a dataframe (classification of substances with the classes) in the following format:
A | B | C | D | |
---|---|---|---|---|
1 | Organic compounds | Benzenoids | Benzene | NA |
2 | Organic compounds | Benzenoids | Benzene | NA |
3 | Organic compounds | Organic oxygen compounds | NA | NA |
4 | NA | NA | NA | NA |
5 | Organic compounds | Benzenoids | NA | NA |
At the end i need a dataframe with 2 columns. The result should be something like this:
class | count |
---|---|
Organic compounds; Benzenoids; Benzene | 2 |
Organic compounds; Organic oxygen compounds | 1 |
Organic compounds; Benzenoids | 1 |
What is my first step? I tried to create a new column with the paste content of all the other columns like this:
df$class <- paste(df$A,df$B,df$C,df$D ,sep = "; ")
But the result is:
class |
---|
Organic compounds; Benzenoids; Benzene; NA |
Organic compounds; Benzenoids; Benzene; NA |
Organic compounds; Organic oxygen compounds; NA; NA |
NA; NA; NA; NA |
Organic compounds; Benzenoids; NA; NA |
What would be a conceivable approach for this problem, to get the final result?
Thanks alot!