0

I have merged two dataframes in R and when I check str(data), it shows that some factors have hundreds of levels, even though when I check the dataframe it only contains 21 levels.

'data.frame':   21 obs. of  6 variables:
$ TrustName     : Factor w/ 382 levels "#NAME?","2Gether NHS Foundation Trust",..: 14 17 18 55 73 93 104 107 116 121 ...
$ TrustCode     : Factor w/ 317 levels " ","00D","00P",..: 134 86 122 205 154 241 194 152 208 306 ...
$ ResponseRate16: Factor w/ 70 levels "--","100","28",..: 18 21 17 23 8 31 35 13 30 17 ...
$ Base16        : Factor w/ 300 levels "--","1,039","1,057",..: 232 73 191 216 147 194 4 70 143 6 ...
$ ResponseRate15: Factor w/ 34 levels "27.29%","27.63%",..: 18 5 13 31 3 15 34 9 12 10 ...
$ Base15        : Factor w/ 34 levels "1,279","1,456",..: 23 7 18 12 31 19 28 6 15 32 ...

The factors with 300+ levels are the ones in question as they only contain between 20 and 30 values in total.

I even cleaned/removed the #NAME? values from the original dataframes before merging them and I checked that they had been removed successfully (they had).

why is this happening and how can I fix it to depict a more accurate picture?

lmo
  • 37,904
  • 9
  • 56
  • 69
Mus
  • 7,290
  • 24
  • 86
  • 130

0 Answers0