-2

I am a new to programming. When I load my data into R I find:

>str(g)
data.frame':    253227 obs. of  2 variables:

 $ ID             : int  7896741 7896743 7896745 7896747 7896749 7896751 7896753 7896755 7896757 7896758 ...
 $ gene_assignment: Factor w/ 85855 levels "","---","AB001736 // IGLJ3 /// AB001733 // IGLJ3 /// ENST00000390609 // IGHV3-23 /// X14584 // IGHV3-23 /// BC072419 // "| __truncated__,..: 16002 81923 16018 2 2 2335 2 2392 5497 5497 ...
  1. How can I remove two categories ("";"---") from $gene_assignment? What type of code should I use?

  2. “ AB001736 // IGLJ3 /// AB001733 // IGLJ3 /// ENST00000390609 // IGHV3-23 /// X14584 // IGHV3-23 /// BC072419 // "|

This factors contains many parameters but some are common like ENST00000390609 or AB001733. How can I remove these values?

Dr. Fabian Habersack
  • 1,111
  • 12
  • 30
gene
  • 1
  • 1
  • Possible duplicate of [Conditionally Remove Dataframe Rows with R](https://stackoverflow.com/questions/8005154/conditionally-remove-dataframe-rows-with-r) – Dr. Fabian Habersack Aug 25 '17 at 13:00

1 Answers1

0

I'm not really sure what you're asking, so I can only interpret what you want to do. Ideally I would have left a comment, but it tells me that 50+ reputation is needed to do that.


So, if you want to replace the the two categories of your factor variable by missing values (NAs) then this should work:

data.frame$gene_assignment <- ifelse(data.frame$gene_assignment==...,NA,data.frame$gene_assignment)

where ... is the targeted value or category of your variable, NA is the new value (missing), and the rest is for keeping all your other values and categories unchanged.

The same code can be used, obviously, for your second problem. Simply fill in the targeted value (1 at a time of course) and the value you want to replace it by.

You can also create dummies very easily in that way, if that's what you want:

data.frame$dummy<-ifelse(data.frame$gene_assignment=...,1,0)

If you want to delete rows (listwise) from your data.frame based on a given value (e.g. "---") of a specific variable (e.g. $gene_assignment), this: data.frame[!(data.frame$gene_assignment=="---"),] or this: subset(data.frame, gene_assignment!="---") should do the trick. You should watch your NAs, though.


Hoping this is helpful.

If not, and in case I misinterpreted your question, take a look at what you can do to improve they way you're writing questions. Details are key and a 'reproducible example' (some exemplary data) will make it easier to respond and result in better answers for you (see here, here, and here).

Dr. Fabian Habersack
  • 1,111
  • 12
  • 30
  • Also, you should do some research before asking question so that you don't accidentally duplicate existing ones. Take a look at this [post](https://stackoverflow.com/questions/31331217/remove-values-in-vector-from-double-variable-in-r), and [this](https://stackoverflow.com/questions/8005154/conditionally-remove-dataframe-rows-with-r) too. – Dr. Fabian Habersack Aug 25 '17 at 12:56
  • @gene: I'm commenting here because it doesn't let me writer under your "answer" (which I flagged as "not an answer" btw). **Regarding your question:** Seems like the problem occurred when you tried to load your data in R. Make sure to specify the right separator of your columns (as used in the excel sheet). By default this is "," but can be something else. `sep="..."` will do the trick. Run `?read.table` for more info on loading data. Also, do some research on StackOverflow: there are lots of people that had similar questions lots of brainy solutions. – Dr. Fabian Habersack Aug 25 '17 at 14:36