0

I'm working on a project for my Economics capstone with a very large data set. This is my first time ever programming and I had to merge multiple data sets, 16 in total, with anywhere between 30,000-130,000 observations. I did experience an issue merging the data sets since certain data sets contained more columns than others, but I was able to address it using "rbind.fill" Afterwards, I attempted to run a regression but I encountered an error. The error was

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

Here is the original code for the regression

ols_reg_mortcur1 <- lm(MORTCUR ~ EST_ST + WEEK + TBIRTH_YEAR + EGENDER + RHISPANIC + 
  RRACE + EEDUC + MS + THHLD_NUMPER + THHLD_NUMKID + THHLD_NUMADLT + WRKLOSS + ANYWORK + 
  KINDWORK + RSNNOWRK + UNEMPPAY + INCOME + TENURE + MORTCONF, data = set_up_weeks15st)

I googled the error for some possible solutions; I found solutions like "na.omit", "na.exclude"' etc. I tried these solutions to aval. This leads me to think I didn't implement them correctly or perhaps something went wrong with the merge itself. While I was cleaning the data I set unknown or missing values, listed as -88 or -99 in the data sets, to NA since I had to create a summary stats table. I'll attach my R doc. I do apologize for the length of the attached code below I was sure if to just attach the sections leading up to the regression or include other lines.

mrivanlima
  • 561
  • 4
  • 10
eak115
  • 13
  • 4
  • It's easier to help you if you include a simple, minimal [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 23 '20 at 04:36
  • Sounds like your missing data code is not uniform, that is, different coding schemes for representing missing values were employed across the datasets you merged - which is very understandable since they are disparate datasets from different research groups. I would open the merged dateset with Excel, then save as a "tab-delimited text" file whose filename extension is .txt. Then open the myfilename.txt file with Notepad, and do a search and replace and replace NA with nothing (don't type anything for the replacement). Save the file as Excel or .csv, then retry with R. –  Nov 23 '20 at 04:51

1 Answers1

0

Based on the error message, 0 (non-NA) cases the likely reason is that you have at least one NA in each of your rows. (Easy to check this by using na.omit(set_up_weeks15st). This should return zero rows.)

In this case, setting na.action to na.omit or na.exclude is not going to help.

Try to find columns with most NA's and remove them, or impute the missing values using an appropriate method.

kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42