0

My question seems to be related to this thread.

However, the method given there does not work for me.

I define a vector from a dataset as: eduyears1994 <- year1994$q131ed and receive a vector that looks like:

[1] 17 lat/9   1O lat/3,4 1O lat/3,4 17 lat/9   17 lat/9   12 lat/5,6
                                        1O lat/3,4 1O lat/3,4 12 lat/5,6
   9 Levels: Brak formal wykszta³cenia 4 lata/1 8 lat/2 1O lat/3,4 12 lat/5,6 
     14 lat/7,8 ... BRAK DANYCH

where e.g. "10 lat" stands for 10 years (of education) and "/3,4" most likely stands for the factor label.

I would simply like to have a numeric variable where I have e.g. "10" instead of "10 years" in the column.

I have tried the following and received the following error message:

eduyears1994n <- as.numeric(as.character(eduyears1994))
Warning message:
NAs introduced by coercion

I also tried to do it manually:

eduyears1994[eduyears1994== "4 lata/1"] <- 4
eduyears1994[eduyears1994== "2"] <- 8
eduyears1994[eduyears1994== "17 lat"] <- 17

but the error message reads:

In [<-.factor(tmp, eduyears1994 == "9", value = 17) :
invalid factor level, NA generated

When I open the file with SPSS I see numbers, not labels, but then the data format was specified as nominal somehow, which might be the cause for the problem.

dput(eduyears1994)
c("17 lat/9", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "12 lat/5,6", 
"12 lat/5,6", "17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "17 lat/9", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6", 
"8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "14 lat/7,8", 
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "8 lat/2", "17 lat/9", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"4 lata/1", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", 
"17 lat/9", "17 lat/9", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "8 lat/2", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"4 lata/1", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "8 lat/2", "14 lat/7,8", "12 lat/5,6", 
"8 lat/2", "8 lat/2", "14 lat/7,8", "8 lat/2", "14 lat/7,8", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"4 lata/1", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", 
"17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"12 lat/5,6", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "8 lat/2", "14 lat/7,8", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "12 lat/5,6", "14 lat/7,8", "17 lat/9", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"14 lat/7,8", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "17 lat/9", "8 lat/2", "14 lat/7,8", "1O lat/3,4", 
"8 lat/2", "17 lat/9", "17 lat/9", "17 lat/9", "12 lat/5,6", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "17 lat/9", "14 lat/7,8", "17 lat/9", "1O lat/3,4", 
"17 lat/9", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "17 lat/9", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "8 lat/2", 
"8 lat/2", "14 lat/7,8", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"14 lat/7,8", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", "12 lat/5,6", 
"8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", "8 lat/2", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"8 lat/2", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"12 lat/5,6", "8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", 
"1O lat/3,4", "17 lat/9", "1O lat/3,4", "17 lat/9", "12 lat/5,6", 
"14 lat/7,8", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "8 lat/2", "4 lata/1", "12 lat/5,6", "17 lat/9", 
"12 lat/5,6", "17 lat/9", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "8 lat/2", 
"8 lat/2", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "14 lat/7,8", 
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"12 lat/5,6", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"14 lat/7,8", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", 
"12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4", 
"17 lat/9", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", "1O lat/3,4"
)
Community
  • 1
  • 1
Asiack
  • 47
  • 8
  • It is not clear why you are changing the `'2'` to `8` while in the other cases ie. with `lat`, the prefix numbers are chosen. – akrun Dec 28 '14 at 20:23
  • I was checking if label 2 will be converted into what it stands for, i.e. 8 years as I was not sure if the error message I receive is because R sees the labels, not the text. @akrun more comments on your solution below your post – Asiack Dec 28 '14 at 20:59
  • Thanks, I got it. The error message you received is because your vector is `factor` and you are trying to assign a `value` or level the factor doesn't have. I would have converted it to `character` class before assigning i.e. `eduyears1994 <- as.character(eduyears1994);eduyears1994[eduyears1994== "17 lat"] <- 17` – akrun Dec 28 '14 at 21:01
  • thanks! it seems that: eduyears1994 <- as.character(year1994$q131ed) eduyears1994[eduyears1994== "17 lat/9"] <- 17, etc. will work! Hope they will be treated as numeric later on, so I can use the vector for a regression analysis. – Asiack Dec 28 '14 at 21:09
  • You have to convert it to numeric by `as.numeric`. You can use either of the first two solutions in my post to convert all the levels in one step. – akrun Dec 28 '14 at 21:10

3 Answers3

2

Using your actual data, it appears that you have a character vector of the general format

n lat/a,b

where n is the years, and "a,b" is some kind of label. This will extract the years.

vec <- c("17 lat/9","10 lat/3,4","10 lat/3,4","17 lat/9","17 lat/9","12 lat/5,6","10 lat/3,4","10 lat/3,4","12 lat/5,6")
x <- strsplit(vec,split=" lat/",fixed=TRUE)
sapply(x,function(x)as.integer(x[1]))
# [1] 17 10 10 17 17 12 10 10 12
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • 1
    I think this contradicts to the OP's manual code `eduyears1994[eduyears1994== "2"] <- 8` – akrun Dec 28 '14 at 20:31
  • I chose to focus on the sample dataset provided, rather than make one up. In that sample, there is no "4 lat/1", etc. If the sample is not representative, then this answer might not work for OP. – jlhoward Dec 28 '14 at 20:34
  • I was also thinking about using the dataset provided until I saw his code – akrun Dec 28 '14 at 20:36
  • Error in strsplit(eduyears1994, split = " lat/", fixed = TRUE) : non-character argument – Asiack Dec 28 '14 at 21:03
  • @Asiack The `strsplit` needs `character` vector. Try `strsplit(as.character(eduyears1994),...` – akrun Dec 28 '14 at 21:05
  • if I define the first vector as.character I receive: [[693]] [1] "8" "2" [[694]] [1] "1O" "3,4" [[695]] [1] "1O" "3,4", etc. so I would need to remove "3,4" thingies – Asiack Dec 28 '14 at 21:13
  • read all of @jlhoward's answer: the second line does what you're asking for (but it doesn't convert your "2" values to 8 ...) – Ben Bolker Dec 28 '14 at 21:31
1

You could try

c(17,8,4)[as.numeric(eduyears1994)]
#[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  8  4  4  8  4  8  8

or

 unname(c('4 lata/1'=4, '2'=8, '17 lat' =17)[as.character(eduyears1994)])
 #[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  8  4  4  8  4  8  8

If 8 was infact a typo, you could use

 library(stringi)
 as.numeric(unlist(stri_extract_all_regex(eduyears1994, '^\\d+')))
 #[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  2  4  4  2  4  2  2

data

set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Using @akrun's example:

set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))

Using gsub and an (apparently) appropriately regular expression (* denotes "0 or more of the preceding character or pattern", so e.g. "lata*" matches "lat" or "lata")

as.numeric(gsub(" lata*[/0-9,]*","",eduyears1994))

warning: this converts "2" into 2, not 8, which is not what you asked for. I'm not quite sure by what logic you convert "4 lata/1" to 4, "17 lat" to 17, and "2" to 8 -- perhaps you could explain? Maybe that was a typo?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I was checking if label 2 will be converted into what it stands for, i.e. 8 years. gsub yields the following: > as.numeric(gsub(" lata*/*[0-9,]*","",eduyears1994)) [1] 17 NA NA 17 17 12 NA NA 12 14 NA NA NA 17 12 12 17 NA 12 12 12 12 12 17 NA NA 14 17 NA NA 12 12 17 [34] 17 17 17 12 12 14 12 8 NA 12 8 17 NA 12 NA NA NA 17 8 8 NA NA 12 12 17 NA 14 NA 14 NA NA 17 12 – Asiack Dec 28 '14 at 20:46
  • I'm surprised you get all of those `NA` values -- with my example, everything gets converted. Can you show the results of `dput(eduyears1994)` ? Can you explain more about why you get labels like "2" (which corresponds to 8 years for reasons I don't understand) mixed with text labels like "17 lat" ? – Ben Bolker Dec 28 '14 at 21:29
  • Sorry for the lag. I think the data was specified as nominal instead of numeric and R reads it in in a strange way. The results of dput(eduyears1994) where eduyears1994 <- as.character(year1994$q131ed) are in the post, as the comment field is too short for it. – Asiack Jan 01 '15 at 21:38
  • There are 9 Levels or what I call "labels": no education / 0, 4 years / 1, 8 years / 2, 10 years / 3,4 (I guess it depends on where the additional 2 years after the first 8 years of primary school come from), 12 years / 5,6 , 14 years / 7,8 – Asiack Jan 01 '15 at 21:44