I have a dataframe that's very large (let's say 8 rows by 10,000 columns) that is filled with strings. I want to convert each unique string to a number and replace it with it.
For example, if I had a dataframe:
X1 X2 X3
1 cat mouse rabbit
2 dog cat, dog dog
I'd like to convert it to:
X1 X2 X3
1 1 2 3
2 4 5 4
Note the combined label of "cat,dog" gets its own unique number. The actual numbering of each string is irrelevant as I'm doing this for an inter-rater reliability calculation.
Short of me getting all the unique elements, assigning them a number and replacing is there a more elegant way to do this?
Also, if a value in an element is blank, eg "", it should be converted to an NA in the numeric DF.