0

First, I have read (R: replace NA with item from vector) but I need a bit more detail.

Typically this operation is executed in Fox Pro but that is not going to be available soon.

So I have a data set. Below is the code that I have used as of yet.

x<- read_fwf(file="NEW.DATA1", skip=0, fwf_widths(c(2,2,6,4,2,2,2,4,2,1,9,8,9,9,9,9,9,14,8,14,12,1)))

write.csv(x, file= "newdata1a.csv",row.names=FALSE)

The second column intermittently has NA values due to there being nothing in the original text file (newdata1). What I am hoping to accomplish is replacing the NA with a 04 or 07 based on the values of the third column. This may sound rudimentary but the articles that I have found don't seem to match up.

Krantz
  • 1,424
  • 1
  • 12
  • 31
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43
  • 3
    Can you elaborate on "based on the values of the third column"? What about those third column values will tell you whether to use 04 or 07? Essentially you want something like `col2_na = is.na(x[, 2]); x[col2_na, 2] = ifelse(x[col2_na, 3] , "04", "07")`. – Gregor Thomas Aug 30 '17 at 18:49
  • If x3(area or third column) =000995,000996, 0000998,000999, or 000ALL then make the second column 07. If not then make it a 04. – Tim Wilcox Aug 30 '17 at 18:52
  • 1
    All right, so use the code in my above comment with `... = ifelse(x[col2_na, 3] %in% c('000995', '000996', '0000998', '000999', '000ALL'), "07", "04")`. – Gregor Thomas Aug 30 '17 at 19:03
  • Ok, Thanks. So that I can decipher the lingo better in the future. "col2_na = is.na(x[, 2]); x[col2_na, 2]". Is this telling R that we are only concerned with the 2nd position (column) and to only look for NA values in that position/column? – Tim Wilcox Aug 30 '17 at 19:06
  • More or less, yes. Run each piece and look at what's there. Data frames are index `data_frame_name[rows, columns]`. `col2_na` is TRUE when the second column is missing. `x[col2_na, 2]` is those missing values. `x[col2_na, 3]` is the corresponding values from column 3. `x[col2_na]` is all the columns but only the rows where the second column is missing. – Gregor Thomas Aug 30 '17 at 19:11
  • Error in ifelse(x[col2_na, 3] %in% c("000995,000996", "0000998", "000999", : argument "yes" is missing, with no default How does one resolve this error? – Tim Wilcox Aug 30 '17 at 20:07
  • You've got at least one syntax error, `"000995,000996"` should probably be `"000995", "000996"`, though that won't cause the error you show. How about you edit the code you ran into the question. – Gregor Thomas Aug 30 '17 at 20:09
  • Better if you share enough data [to make a minimal reproducible example](https://stackoverflow.com/q/5963269/903061). Can't test anything without your data. Use `dput()` to make shared data copy/pastable – Gregor Thomas Aug 30 '17 at 20:10
  • I work for a state government. Is it possible to share/send a cleaned up version of this in the form of a text file/excel doc? – Tim Wilcox Aug 30 '17 at 20:28
  • We only need enough data to show the problem. 5 rows of data should be plenty, and all we need are columns 2 and 3. The link in my above comment has lots of recommendations for sharing data - you can use `dput()` on a subset of your data, you can share code to simulate sample data, lots of easy options – Gregor Thomas Aug 30 '17 at 20:32
  • I have to put this project on ice for a bit. Attempting to put these fixed width files into sql instead of manipulating them in R. Thanks for the help – Tim Wilcox Aug 30 '17 at 23:56

0 Answers0