0

I am fairly new to R and having trouble with my unique identifiers of participants. I imported my dataset from Stata into R, all good. All variables appear as they should including my ID variable. However, I am trying to run PCA but weirdly the PCA is using my ID variable as an item variable. Does anyone know what the problem might be? The ID variable is currently chr. I thought R automatically recognized unique IDs for participants?

Machavity
  • 30,841
  • 27
  • 92
  • 100
  • 2
    Welcome, it would be very helpful if you could share at least a snippet of your data and what pca function you are using `prcomp` ??? Odds are you just need to tell the function not to include ID in your analysis but exactly how depends on your data and what command you're using. – Chuck P Jun 03 '20 at 15:25
  • 2
    Please show your code implementation with [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451). We need to see how you are *trying to run PCA*. R is an extendable language and may have several PCA solutions. Include all `library` lines. – Parfait Jun 03 '20 at 15:55

1 Answers1

0

R does not really have a concept of a unique identifier like an SQL primary key. Instead, you have a few options. You can exclude the identifier from the data you feed into PCA like this:

df_for_pca = df[, 2:ncol(df)] # assuming id is the first column of df

you can also add row names which are not treated as data by most functions but still retained for when you need them:

rownames(df) = df[, 1] # assuming id is the first column of df
df[, 1] = NULL

EDIT: a solution from comments using packages textshape and tibble:

# assuming the id column is called 'my_id'
df_for_pca = df %>% remove_rownames() %>% column_to_rownames(var = 'my_id')
Thomas Rosa
  • 630
  • 10
  • 21
  • Hello everyone! Thank you so much for being so quick in responding. I tried the following code using 'tidyverse' and it worked! example1 <- hfitrial1 %>% remove_rownames() %>% column_to_rownames(var = 'ptid') – Nadia Koyratty Jun 04 '20 at 16:54