1

The data set which I'm using is filled with irrational numbers and NA's. A sample can be found below

>head(df, n=5)
             cheading1   cheading2  cheading1  cheading3  cheading1   cheading1
        1    1.0925485       NA     0.714186       NA     0.008650       NA
        2    1.0564646       NA     0.714286       NA     0.008651       NA
        3    0.9816899       NA     0.714186       NA     0.008652       NA
        4    0.9857995       NA     0.714186       NA     0.008651       NA
        5    0.9760769       NA     0.714086       NA     0.011350       NA

> dim(df)
[1] 16500   199

Please do not assume that the columns in the sample represent a continuing stream of the same data type. further on as the row number increases, column1 becomes filled with NAs and the other columns act in the same way. All columns have both irrational numbers and NA's inside. There are also zeros everywhere in this data frame

So, of course, when I try to take the natural log of the whole data set, it returns an error because of the non numeric values "NA"

log(df, base=exp(1))

> Error in Math.data.frame(df, base = exp(1)) :    non-numeric variable
> in data frame: cheading2

I tried using the remove tool to try and tell R to exclude the NA's while executing the natural log on all numeric values but again it returned an error.

log(df, base=exp(1), na.rm=T)

> Error in log(df, base = exp(1), na.rm = T) :    unused argument (na.rm
> = TRUE)

So how does one take the natural log of this entire data frame (with column headers), ignore all NAs and result with another table e.g. lndf which still has it's headers and NAs?

I've also tried to use a for loop but with the same outcome. (too many NANs produced)

I plan on using this data in a fixed effect regression after this has been solved. I hope to be able to answer any questions that may arise.

Also tried taking logs of every single numeric column to then combine them. Still doesn't work.

lnoecd<- log(df$oecd, base=exp(1))
lng20<- log(df$g20, base=exp(1))
lnoecdna<- log(df$oecdna, base=exp(1))
lnifscode<- log(df$ifscode, base=exp(1))
lnccode<- log(df$ccode, base=exp(1))
lnyear<- log(df$year, base=exp(1))
lnoxfx<- log(df$oxfx, base=exp(1))
lnncusd2011<- log(df$ncusd2011, base=exp(1))
lnncppp2005<- log(df$ncppp2005, base=exp(1))
...
...


 lndf <- c(lnoecd, ...the lot

Whenever i take the log of any numeric column and then look at the dimensions of the edited column(s) just returns NULL

Note: Very new to programming and have started using R as a foot in the door. Apologies in advance for any possible lack of trivial knowledge. I hope that individuals that try to help me will be satisfied with how I'm coming across.

Cailloux
  • 11
  • 2
  • 3
    `NA`s can be numeric. They are not the problem. Look at how you create the data.frame and make sure that all columns are numeric. – Roland Apr 06 '16 at 10:32
  • 1
    You can quickly check that you've done @Roland's suggestion with `str(df)`. You should get something like this `'data.frame': 16500 obs. of 199 variables: $ cheading1: num 16500 $ cheading2: num 16500...`. Use `as.numeric` to convert columns; you may also find `NA_real_` useful as it is the numeric literal NA value (by contrast with vanilla `NA` which defaults to logical i.e. boolean). – Philip Apr 06 '16 at 10:46
  • Welcome to Stack Overflow! [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269). Please add output of `dput(head(df))` to your post. Also, try this example: `log(c(NA,NaN,-Inf,Inf,0,1,-1,10))` vs `log("a")`. – zx8754 Apr 06 '16 at 11:15
  • try adding `header=TRUE` when you read in your data ... – Ben Bolker Apr 06 '16 at 14:09
  • "unused argument (header = TRUE) – Cailloux Apr 06 '16 at 15:51
  • how are you reading your data? – Ben Bolker Apr 06 '16 at 16:07
  • df<-read.dta(file location) – Cailloux Apr 07 '16 at 08:10
  • I should have mentioned that the data is from stata – Cailloux Apr 07 '16 at 08:12

0 Answers0