Exctracting values from txt file in R

Question

I have txt file with the following format:

(4, 'AF', 'AFG', 'Afghanistan'),
(248, 'AX', 'ALA', 'Aland Islands'),
               .
               .
               .

I want to extract the number and the country. My idea is to use gsub with "[^0-9]" to find the number and something like tail(strsplit()) to extract the last word, after offcure I have removed all the special characters. Is there a a quicker way?

Data:

structure(list(V1 = c("(4, 'AF', 'AFG', 'Afghanistan'),", "(248, 'AX', 'ALA', 'Aland Islands'),", 
"(8, 'AL', 'ALB', 'Albania'),", "(12, 'DZ', 'DZA', 'Algeria'),", 
"(16, 'AS', 'ASM', 'American Samoa'),", "(20, 'AD', 'AND', 'Andorra'),"
)), .Names = "V1", row.names = c(NA, 6L), class = "data.frame")

try a `strsplit` on the `,` then take the first and the fourth column ? — etienne, Nov 17 '15 at 13:31
@mpizosdimitris can you put a dput of (the head of) your data in the question? Makes solving things easier. — Heroka, Nov 17 '15 at 14:18
[How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) — zx8754, Nov 17 '15 at 14:31
I did not asked for a solution. I asked for an alternative method. I don't see a reason why you downvoted. Anyway, thanks for your feedback. — Mpizos Dimitris, Nov 17 '15 at 14:33

score 0 · Accepted Answer · answered Nov 17 '15 at 14:54

If your data.frame is called df, here is a way using regex:

Get the first number:

sub("^\\((\\d+).*", "\\1", df$V1)
#[1] "4"   "248" "8"   "12"  "16"  "20"

Get the country:

sub("[^a-z]+([A-Z][a-z A-Z]+).+", "\\1", df$V1)
#[1] "Afghanistan"    "Aland Islands"  "Albania"        "Algeria"        "American Samoa" "Andorra"

Exctracting values from txt file in R

1 Answers1