0

How can i get the following

df | df$nsp class1 | 1 | class2 | 2 | class3 | 3 | class1 | 1

class3 3

vicky
  • 27
  • 6
  • so basically I have three classes to deal with class1, class2 & class3 in their entire data of thousands of rows. Problem is when I apply random forest it doesnt work & I need to change my labels to numeric form (1,2 & 3 in a separate column) and then try random forest again – vicky Nov 23 '21 at 19:02
  • Please put the extended explanation in the question. Also include a short (max ~5-10 rows/cols) example of your input and desired output. – Andre Wildberg Nov 23 '21 at 19:09

1 Answers1

0

If I understood, your labels are in string format: 'class1', 'class2', ... Like:

df = data.frame(nsp = c('class1','class2','class3','class4'))

And you want to use only the number to use RF. You can get the numbers using gsub to extract only the numbers and as.numeric to convert string to int. Here, I'm using tidyverse library but it is not relevant.

library(tidyverse)
  
df %>% mutate(class= as.numeric(gsub(".*?([0-9]+).*", "\\1", nsp)))

     nsp class
1 class1     1
2 class2     2
3 class3     3
4 class4     4

With base R you can use:

df$class = as.numeric(gsub(".*?([0-9]+).*", "\\1", df$nsp))
RobertoT
  • 1,663
  • 3
  • 12
  • right this this what I want. Why am I unable to get my inputs into the question like you have done. I need to look into that. But yes Robert, thats what I am looking for – vicky Nov 23 '21 at 19:16
  • the first column is df and I am looking to generate df$nsp with 1, 2 , 3 corresponding to whatever class is specified in the corresponding cell – vicky Nov 23 '21 at 19:18
  • If you just want to edit the same column -I would recomend you to keep or backup to compare if it was right -, you can write: `df$nsp= as.numeric(gsub(".*?([0-9]+).*", "\\1", df$nsp))` . Then you are allocating the same array in nsp column instead of creating a new one. – RobertoT Nov 23 '21 at 19:20
  • so there are only three classes : class1, class2 and class3 & occuring over and over. in the rows – vicky Nov 23 '21 at 19:20
  • If you don't provide a dataframe as an example I can't deepen in the problem. As I understood, in your dataset you have a column that are the labels for RF ('class1','class2', etc) and you want just to say 1,2,3... Then just use the code below changing the name of the columns to the one in your dataset. – RobertoT Nov 23 '21 at 19:22
  • I can do that manually but this 50,000 rows. – vicky Nov 23 '21 at 19:24
  • You can select a subset (a few rows/columns) and use dput(). Other way is to create a minimal reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – RobertoT Nov 23 '21 at 19:27
  • However, I'm looking now at the new edited question and your data.frame is in rows? See the link to provide enough details for your question: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – RobertoT Nov 23 '21 at 19:28
  • Trying again, so basically class1,class2 & class3 are my labels for rest of the data. I am looking to generate another column converting my labels into 1,2 and 3 and then I will make the original class column NULL – vicky Nov 23 '21 at 19:28