-1

I have a column in my data frame as genre of the movies and there are many of them. I want to convert it into the numerical data for plotting the correlation matrix. Please help me do that.

Genre         Genre_numerical
Comedy        1
Action        2
Suspense      3
Comedy        1
Biography     4
guscht
  • 843
  • 4
  • 20
lala
  • 11
  • 5

2 Answers2

0

In R, you can make a factor of categorical data. That is a basic thing to do (or avoid until the last possible moment) in R. Look into unordered and unordered factors, if that is something you need to freshen up on.

Your question seems to pertain more to the question of how to correlate categorical data.

Look at this answer and then read the thread: Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

Strength of association is calculated for nominal vs nominal with a bias corrected Cramer's V, numeric vs numeric with Spearman (default) or Pearson correlation, and nominal vs numeric with ANOVA. - @Holger Brandl

0

Here's two solutions, one is base R, the other is based on dplyr:

Illustrative data:

set.seed(123)
df <- data.frame(Genre = sample(c("Comedy", "Action", "Suspense", "Biography"), 10, replace = T))

Solution #1:

You can assign numerical values to your Genre categories using ifelse:

df$Genre_numerical <- ifelse(df$Genre == "Comedy", 1,
                            ifelse(df$Genre == "Action", 2,
                                   ifelse(df$Genre == "Suspense", 3, 4)))

Solution #2:

library(dplyr)
df$Genre_numerical <- df %>% 
  mutate(Genre = case_when(Genre == "Comedy"   ~ 1,
                           Genre == "Action"   ~ 2,
                           Genre == "Suspense" ~ 3, 
                           TRUE                ~ 4))

Result:

The result is the same in either case:

df
       Genre Genre_numerical
1     Action               2
2  Biography               4
3     Action               2
4  Biography               4
5  Biography               4
6     Comedy               1
7   Suspense               3
8  Biography               4
9   Suspense               3
10    Action               2
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34