Why are there extra levels listed in str(data) which aren't visible/accessible?

Question

I have merged two dataframes in R and when I check str(data), it shows that some factors have hundreds of levels, even though when I check the dataframe it only contains 21 levels.

'data.frame':   21 obs. of  6 variables:
$ TrustName     : Factor w/ 382 levels "#NAME?","2Gether NHS Foundation Trust",..: 14 17 18 55 73 93 104 107 116 121 ...
$ TrustCode     : Factor w/ 317 levels " ","00D","00P",..: 134 86 122 205 154 241 194 152 208 306 ...
$ ResponseRate16: Factor w/ 70 levels "--","100","28",..: 18 21 17 23 8 31 35 13 30 17 ...
$ Base16        : Factor w/ 300 levels "--","1,039","1,057",..: 232 73 191 216 147 194 4 70 143 6 ...
$ ResponseRate15: Factor w/ 34 levels "27.29%","27.63%",..: 18 5 13 31 3 15 34 9 12 10 ...
$ Base15        : Factor w/ 34 levels "1,279","1,456",..: 23 7 18 12 31 19 28 6 15 32 ...

The factors with 300+ levels are the ones in question as they only contain between 20 and 30 values in total.

I even cleaned/removed the #NAME? values from the original dataframes before merging them and I checked that they had been removed successfully (they had).

why is this happening and how can I fix it to depict a more accurate picture?

What did you do exactly? Did you check `str(droplevels(data))` and it shows the same result? — A5C1D2H2I1M1N2O1R2T1, Mar 17 '17 at 13:15
And what do you mean with merged two dataframes? did the original dataframes already contain these factors with the high amounts of levels or were they added in the merging? — Marijn Stevering, Mar 17 '17 at 13:16
Not to worry, I have discovered that I need to apply `factor()` to the variable once I have merged the dataframes: http://stackoverflow.com/a/1197154/636987 — Mus, Mar 17 '17 at 13:16

Why are there extra levels listed in str(data) which aren't visible/accessible?

0 Answers0