-1

I'm rather new to this and I could use some help. I like to achieve 2 things in R. At the moment i have a dataset called "researchdata".

1. I like to manipulate the data in one specific column.

Let's say I want to change the text "New York" to "NY" in the column/variable "City". (so not the whole dataset at once) I'm not sure the command is different but i also like to do that with a number, for example change "-1" to "NA".

2. Deleting a specific value in a specific column

How do i delete the NA's or missing values or actually any kind of value or string for a specific column. Lets say I want to delete both the values "NA" and "-1" for the column/variable city.

I tried some commands but I couldn't get them work, they weren't what i was looking for. I hope you guys can help, thanks in advance.

Community
  • 1
  • 1
Thundersheep
  • 45
  • 1
  • 8
  • 1
    Please provide a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. – Prradep Jul 01 '17 at 12:33
  • As for question 1; Well for example I came up with this; mydata$city[mydata$city == -1] <- NA So i managed to turn the -1 value into a NA. But i can't turn the 0 value into for example "hello" mydata$city[mydata$city == 0] <- HELLO Error: object 'HELLO' not found As for question 2; I then tried to delete NA only for a specific colum as shown below mydata_sub<- na.omit(mydata$city) but instead of getting a subset i get "values" - "large integer 183381 elements, 1,3mb)" – Thundersheep Jul 01 '17 at 12:47
  • Please respond to comments by *editing your original post*, instead of replying by comment – CPak Jul 01 '17 at 13:00

1 Answers1

1

1. To manipulate data in a specific column, look into dplyr::mutate:

df <- data.frame(A = 1:5, B = 1:5)
library(dplyr)
df %>% mutate(A=ifelse(A==3,NA,A))

2. To remove NAs from your data, you can do:

df1 <- df %>% mutate(A=ifelse(A==3,NA,A))
df1[complete.cases(df1),]

or use dplyr::filter:

df2 <- df
df2 %>% filter(!is.na(A))

Note that these operations remove the entire row where A==NA.

(It sounds like you're just trying to learn, but to get the best help, it's best to provide a small data set and a specific problem (with expected output).)

CPak
  • 13,260
  • 3
  • 30
  • 48
  • Thanks for your reply, yes I am learning but at the same time doing a real research. For now i took a step back and used a free dataset "LungCapData.csv" which you can find here; https://docs.google.com/file/d/0BxQfpNgXuWoIWUdZV1ZTc2ZscnM/edit I edited your command a bit for number 1. mydata %>% mutate(Age=ifelse(Age==15,NA,Age)) And yes it did change those with age of 15 to N.A. on the printed outcome in R. But when i refresh and re-open my dataset the 15 is still there. How come? Ps. something is off with my reply format. – Thundersheep Jul 01 '17 at 14:08
  • ps only can change it to NA not a random number or word? – Thundersheep Jul 01 '17 at 14:10
  • pps with mydata[complete.cases(mydata),] it deletes all NA's from my dataset, I dont want that. I only want the NA's from the column of Age. I suppose this mydata %>% filter(!is.na(Age)) would work. However I can't test it since it doesn't change 15 into NA in my dataset – Thundersheep Jul 01 '17 at 14:12
  • Try `mydata <- mydata %>% mutate(Age=ifelse(Age==15,NA,Age))` – CPak Jul 01 '17 at 14:48
  • Yeah that worked! And the command for question 2 as well. You really helped me well but I've got one tiny little thing left. So in summary I can now label a NA in any kind of column, I can also delete them in a specific column without deleting the other NA's. And i can also wipe the entire sheet clean of NA'S. Thanks for that. But one more thing; For now I can replace everything into a NA.. Is there also an option i can edit for example the number 72 in height to a string of text like "hello" or into a number for example "999" – Thundersheep Jul 01 '17 at 15:26
  • I can help you but you're better off trying to learn how to use `mutate`, and the `ifelse`. Google is your friend... – CPak Jul 01 '17 at 15:37
  • That might be but R simply does not let me change it into that. Your function only works with NA. Can you put me in the right direction at least? – Thundersheep Jul 01 '17 at 15:48
  • I'm pointing you in the right direction: https://www.rdocumentation.org/packages/dplyr/versions/0.5.0 https://www.rdocumentation.org/packages/base/versions/3.4.0/topics/ifelse – CPak Jul 01 '17 at 15:52