I have merged two dataframes in R and when I check str(data)
, it shows that some factors have hundreds of levels, even though when I check the dataframe it only contains 21 levels.
'data.frame': 21 obs. of 6 variables:
$ TrustName : Factor w/ 382 levels "#NAME?","2Gether NHS Foundation Trust",..: 14 17 18 55 73 93 104 107 116 121 ...
$ TrustCode : Factor w/ 317 levels " ","00D","00P",..: 134 86 122 205 154 241 194 152 208 306 ...
$ ResponseRate16: Factor w/ 70 levels "--","100","28",..: 18 21 17 23 8 31 35 13 30 17 ...
$ Base16 : Factor w/ 300 levels "--","1,039","1,057",..: 232 73 191 216 147 194 4 70 143 6 ...
$ ResponseRate15: Factor w/ 34 levels "27.29%","27.63%",..: 18 5 13 31 3 15 34 9 12 10 ...
$ Base15 : Factor w/ 34 levels "1,279","1,456",..: 23 7 18 12 31 19 28 6 15 32 ...
The factors with 300+ levels are the ones in question as they only contain between 20 and 30 values in total.
I even cleaned/removed the #NAME?
values from the original dataframes before merging them and I checked that they had been removed successfully (they had).
why is this happening and how can I fix it to depict a more accurate picture?