I have a data frame data
with a column, named "Project License", which represents a categorical variable, and, thus, in R terminology, is a factor. I'm trying to create a new column, where open source software licenses are combined into larger categories per my classification. However, when I try to combine (merge) levels of that factor, I end up either with a column, where all levels are lost, or unchanged, or with an error message, such as the following one:
Error in factor(data[["Project License"]], levels = classification, labels = c("Highly Restrictive", : invalid 'labels'; length 4 should be 1 or 6
Here's my code for this functionality (extracted from a function):
myLevels <- c('gpl', 'lgpl', 'bsd',
'other', 'artistic', 'public')
myLabels <- c('GPL', 'LGPL', 'BSD',
'Other', 'Artistic', 'Public')
licenses <- factor(data[["Project License"]],
levels = myLevels, labels = myLabels)
data[["Project License"]] <- licenses
classification <- c(highly = c('gpl'),
restrictive = c('lgpl', 'public'),
permissive = c('bsd', 'artistic'),
unknown = c('other'))
restrictiveness <-
factor(data[["Project License"]],
levels = classification,
labels = c('Highly Restrictive', 'Restrictive',
'Permissive', 'Unknown'))
data[["License Restrictiveness"]] <- restrictiveness
I have also tried some other approaches (including ones described in section 8.2.5 in "R Inferno"), but also unsuccessful so far.
What am I doing wrong and how to solve this problem? Thank you!
UPDATE (Data):
> head(data, n=20)
Project ID Project License
1 45556 lgpl
2 41636 bsd
3 95627 gpl
4 66930 gpl
5 51103 gpl
6 65637 gpl
7 41834 gpl
8 70998 gpl
9 95064 gpl
10 48810 lgpl
11 95934 gpl
12 90909 gpl
13 6538 website
14 16439 gpl
15 41924 gpl
16 78987 gpl
17 58662 zlib
18 1904 bsd
19 93838 public
20 90047 lgpl
> str(data)
'data.frame': 45033 obs. of 2 variables:
$ Project ID : chr "45556" "41636" "95627" "66930" ...
$ Project License: chr "lgpl" "bsd" "gpl" "gpl" ...
- attr(*, "SQL")=Class 'base64' chr "ClNFTEVDVCBncm91cF9pZCwgbGljZW5zZQpGUk9NIHNmMDMxNC5ncm91cHMKV0hFUkUgZ3JvdXBfaWQgPCAxMDAwMDA="
- attr(*, "indicatorName")=Class 'base64' chr "cHJqTGljZW5zZQ=="
- attr(*, "resultNames")=Class 'base64' chr "UHJvamVjdCBJRCwgUHJvamVjdCBMaWNlbnNl"
UPDATE 2 (Data):
> unique(data[["Project License"]])
[1] "lgpl" "bsd" "gpl" "website" "zlib"
[6] "public" "other" "ibmcpl" "rpl" "mpl11"
[11] "mit" "afl" "python" "mpl" "apache"
[16] "osl" "w3c" "iosl" "artistic" "apsl"
[21] "ibm" "plan9" "php" "qpl" "psfl"
[26] "ncsa" "rscpl" "sunpublic" "zope" "eiffel"
[31] "nethack" "sissl" "none" "opengroup" "sleepycat"
[36] "nokia" "attribut" "xnet" "eiffel2" "wxwindows"
[41] "motosoto" "vovida" "jabber" "cvw" "historical"
[46] "nausite" "real"