1

Salutations

Currently creating a neural network, and need to have the data structured properly. For one of the data columns, there is string data that needs to be converted to a numeric. Only problem is, is that the string data in each row is example QWERTGCD, AWERTKRD, TWERTKRR'etc. There is over 1000 lines of rows, each one having the same or different strings like in the example posted. I dont know how to convert multiple strings, into categorical data on this scale. Same thing goes for the labels partion.

So far I have this to start with

dataset$Box = as.numeric(factor(dataset$Box, levels = c(), labels = c()))

Not sure if I am overthinking this, but I cant figure how exactly to input the levels and tables without painstakingly going through the data, and inputing in myself.

Here's an example of the data that being worked with.

B,11979,13236,1261,3,QWERTGCD,1 B,475514,476069,559,33,QWERTOOD,1 C,65534,65867,337,1,QWERAEER,1 C,73738,74657,923,2,AWERTWED,1

Thanks

TheDeezer
  • 13
  • 1
  • 5
  • Hi, welcome to *Stack Overflow*, in order that we can help you, please provide example data and the steps you've tried so far. Consider [*How to make a great reproducible example*](https://stackoverflow.com/a/5963610/6574038), thank you. – jay.sf Mar 06 '18 at 19:58
  • A good reproducible example involves code that will produce an object that has all the characteristics of the object you're having a problem with, so that the solution to your problem can be applied to the example object you've provided. Here, if your data is a data frame, you should provide example data that is a data frame. If it's a matrix, you should provide a matrix. If it's a raw set of comma separated values, then you have more to work on before getting to this question (how do you convert a column of data from character to numeric). – De Novo Mar 06 '18 at 20:14

1 Answers1

1

Without a reproducible example, it's hard to know exactly what you need, but in general, one thing R is good at is running operations on entire columns all at once. You're just converting a column in dataset that is named Box from a string to numeric, going through a factor. factor() finds all the unique values in your column for you. So you don't need to specify them.

dataset$Box <- as.numeric(factor(dataset$Box))

will take the Box column in dataset and convert it from class character to class numeric, numbering the character values in Box in alphanumeric order (unless you specify otherwise). It may even already be a factor, depending on how your dataset was generated. You can check with class(dataset$Box). If that returns factor then you just need to run dataset$Box <- as.numeric(dataset$Box)

De Novo
  • 7,120
  • 1
  • 23
  • 39