Unique data frame but keep complete cases

Question

I have a data frame as below:

df1
ID    NAME   LOCATION
101   Jack   Netherlands
102   Jack     NA
104   Tom      NA
105   Tom     
123   Sam     
124   Sam      NA
134   Dan     
135   Dan    Germany

I would like to have an output like this:

df2
ID    NAME   LOCATION
101   Jack   Netherlands
104   Tom      NA
124   Sam      NA
135   Da     Germany

Thanks for your help.

`df2 <- df1[complete.cases(df1),]` – Ryan Morton Jan 26 '17 at 18:34 — Ryan Morton, Jan 26 '17 at 18:34

score 3 · Accepted Answer · edited May 23 '17 at 12:09

You seem to have two types of missing data, some marked NA (which you still consider "complete") and some marked "" (which you want to omit).

The R convention is opposite from yours - rows with NA are not considered complete, but the empty string "" is perfectly valid data. I would recommend you match R's convention while using R - replace the NA values in your data frame with a string (maybe "missing", or "not applicable"), and replace the empty strings in your data with NA since you consider them missing. Then complete.cases will work perfectly for you as suggested in the comments, df2 <- complete.cases(df1)

You can use the replace function to make these changes to your data column. If your data column is a factor, you could may instead edit the levels (or just convert it to character and use the replace function). If you share your data reproducibly with dput() (see here for details) I'll be happy to show some more explicit code, but as-is I'm not sure of the structure and underlying classes in your data.

Unique data frame but keep complete cases

1 Answers1