0

I have a data table listing data by country, and I want to add region information. I made the default "Europe" so I then need to correct those rows which are not in the "Europe" region. Here's what I need to assign:

    ordered_tally[ , region :="Europe"]
    AsiaRows = c(5, 12, 17, 24, 25, 26, 34, 36, 45, 56, 61, 64, 75, 76, 81, 88, 90, 91)
    NorthAmericaRows = c(7, 18, 42, 47)
    SouthAmericaRows = c(11, 13, 32, 38, 41, 46, 48, 52, 53, 57, 58, 68, 77, 78, 83)
    AfricaRows = c(22, 66, 67, 70, 74, 82, 84, 86, 95)
    OceaniaRows = c(31, 62)
    MiddleEastRows = c(20, 51, 71, 72, 87, 89, 92)
    region = c("Asia", "NorthAmerica", "SouthAmerica", "Africa", "Oceania", "MiddleEast")
    region_indices = c(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows)

Here's what worked for me:

lapply(c(1:6), function(x) ordered_tally[region_indices[[x]], "region":= eval(region[x])])

Is there a better way to do this? I have the sense lapply shouldn't be necessary and that there must be a native data.table operation I'm missing.

Here's ordered_tally:

    V1              country  freq
 1:  1               Poland 27718
 2:  2                Spain  9800
 3:  3              Denmark  7551
 4:  4       United Kingdom  4910
 5:  5             Thailand  4069
 6:  6               Norway  3995
 7:  7        United States  3837
 8:  8          Netherlands  3075
 9:  9               France  2866
10: 10                Italy  2429
11: 11               Brazil  2109
12: 12                India  1804
13: 13            Argentina  1610
14: 14              Germany  1437
15: 15               Sweden  1409
16: 16            Lithuania  1339
17: 17               Taiwan  1241
18: 18               Mexico  1218
19: 19               Latvia  1152
20: 20               Israel  1076
21: 21       Czech Republic  1064
22: 22         South Africa  1036
23: 23             Portugal   897
24: 24             Malaysia   889
25: 25            Indonesia   851
26: 26            Hong Kong   722
27: 27               Russia   670
28: 28          Switzerland   635
29: 29              Finland   613
30: 30              Ireland   603
31: 31            Australia   588
32: 32                Chile   573
33: 33              Estonia   540
34: 34                China   514
35: 35              Belgium   498
36: 36            Singapore   488
37: 37              Austria   464
38: 38             Colombia   361
39: 39              Belarus   330
40: 40              Romania   330
41: 41            Guatemala   311
42: 42               Canada   295
43: 43             Bulgaria   288
44: 44               Greece   246
45: 45          South Korea   231
46: 46           Costa Rica   218
47: 47          Puerto Rico   218
48: 48            Venezuela   204
49: 49               Serbia   203
50: 50               Turkey   184
51: 51 United Arab Emirates   175
52: 52              Ecuador   172
53: 53                 Peru   161
54: 54              Hungary   136
55: 55           Montenegro   131
56: 56                Japan   127
57: 57          El Salvador    95
58: 58              Uruguay    84
59: 59               Cyprus    70
60: 60              Iceland    59
61: 61           Bangladesh    57
62: 62          New Zealand    47
63: 63              Moldova    42
64: 64          Philippines    41
65: 65             Slovakia    36
66: 66            Mauritius    32
67: 67              Nigeria    30
68: 68              Bolivia    25
69: 69              Croatia    25
70: 70             Zimbabwe    24
71: 71         Saudi Arabia    20
72: 72                 Oman    19
73: 73             Slovenia    19
74: 74                Egypt    15
75: 75              Vietnam    14
76: 76            Sri Lanka    10
77: 77   Dominican Republic     9
78: 78            Nicaragua     9
79: 79    Macedonia (FYROM)     5
80: 80              Ukraine     5
81: 81           Azerbaijan     4
82: 82              Namibia     4
83: 83             Barbados     3
84: 84             Botswana     3
85: 85                Malta     3
86: 86                Ghana     2
87: 87                 Iraq     2
88: 88           Kazakhstan     2
89: 89              Lebanon     2
90: 90                Macau     2
91: 91                Nepal     2
92: 92                Qatar     2
93: 93              Albania     1
94: 94        Faroe Islands     1
95: 95              Morocco     1

and here's the dump code to reproduce:

ordered_tally <-
data.table(V1 = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", 
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", 
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", 
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85", 
"86", "87", "88", "89", "90", "91", "92", "93", "94", "95"), 
    country = c("Poland", "Spain", "Denmark", "United Kingdom", 
    "Thailand", "Norway", "United States", "Netherlands", "France", 
    "Italy", "Brazil", "India", "Argentina", "Germany", "Sweden", 
    "Lithuania", "Taiwan", "Mexico", "Latvia", "Israel", "Czech Republic", 
    "South Africa", "Portugal", "Malaysia", "Indonesia", "Hong Kong", 
    "Russia", "Switzerland", "Finland", "Ireland", "Australia", 
    "Chile", "Estonia", "China", "Belgium", "Singapore", "Austria", 
    "Colombia", "Belarus", "Romania", "Guatemala", "Canada", 
    "Bulgaria", "Greece", "South Korea", "Costa Rica", "Puerto Rico", 
    "Venezuela", "Serbia", "Turkey", "United Arab Emirates", 
    "Ecuador", "Peru", "Hungary", "Montenegro", "Japan", "El Salvador", 
    "Uruguay", "Cyprus", "Iceland", "Bangladesh", "New Zealand", 
    "Moldova", "Philippines", "Slovakia", "Mauritius", "Nigeria", 
    "Bolivia", "Croatia", "Zimbabwe", "Saudi Arabia", "Oman", 
    "Slovenia", "Egypt", "Vietnam", "Sri Lanka", "Dominican Republic", 
    "Nicaragua", "Macedonia (FYROM)", "Ukraine", "Azerbaijan", 
    "Namibia", "Barbados", "Botswana", "Malta", "Ghana", "Iraq", 
    "Kazakhstan", "Lebanon", "Macau", "Nepal", "Qatar", "Albania", 
    "Faroe Islands", "Morocco"), freq = c(27718L, 9800L, 7551L, 
    4910L, 4069L, 3995L, 3837L, 3075L, 2866L, 2429L, 2109L, 1804L, 
    1610L, 1437L, 1409L, 1339L, 1241L, 1218L, 1152L, 1076L, 1064L, 
    1036L, 897L, 889L, 851L, 722L, 670L, 635L, 613L, 603L, 588L, 
    573L, 540L, 514L, 498L, 488L, 464L, 361L, 330L, 330L, 311L, 
    295L, 288L, 246L, 231L, 218L, 218L, 204L, 203L, 184L, 175L, 
    172L, 161L, 136L, 131L, 127L, 95L, 84L, 70L, 59L, 57L, 47L, 
    42L, 41L, 36L, 32L, 30L, 25L, 25L, 24L, 20L, 19L, 19L, 15L, 
    14L, 10L, 9L, 9L, 5L, 5L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L), user_count = c(4126L, 1533L, 
    1312L, 870L, 620L, 867L, 668L, 493L, 448L, 401L, 530L, 314L, 
    246L, 318L, 338L, 227L, 205L, 201L, 170L, 237L, 166L, 165L, 
    173L, 135L, 171L, 116L, 97L, 139L, 123L, 94L, 102L, 100L, 
    162L, 122L, 86L, 73L, 101L, 79L, 49L, 68L, 28L, 37L, 41L, 
    50L, 49L, 28L, 27L, 35L, 47L, 59L, 22L, 35L, 34L, 16L, 29L, 
    30L, 14L, 24L, 16L, 15L, 29L, 2L, 15L, 6L, 10L, 8L, 6L, 3L, 
    3L, 7L, 5L, 6L, 3L, 4L, 6L, 7L, 3L, 1L, 1L, 4L, 2L, 2L, 1L, 
    1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L))
sunny
  • 3,853
  • 5
  • 32
  • 62
  • Maybe you need to provide `ordered_tally`, the data.table. – Frank Oct 21 '15 at 15:25
  • @Frank what do you mean provide ordered_tally? That's what is pasted at the bottom of my question. What form would you like it in? – sunny Oct 21 '15 at 15:26
  • 1
    Let me put it this way: the code you have provided is of very little use, since it includes example rows counting up to 95 while we are only given a table that counts as high as 7. Here's a guide on what a good *reproducible* example would look like: http://stackoverflow.com/a/28481250/1191259 – Frank Oct 21 '15 at 15:30
  • 1
    @Frank it's been updated with full data. – sunny Oct 21 '15 at 15:34
  • 1
    Seems like the method you used to get `AsiaRows` etc in the first place should be replaced with one pairing country with region in an auxiliary table then merge. I think that's what @Frank was getting at as well. – MichaelChirico Oct 22 '15 at 14:43

1 Answers1

5

I'd suggest a named list instead of c:

region_indices = setNames(list(AsiaRows, NorthAmericaRows, SouthAmericaRows, AfricaRows, OceaniaRows, MiddleEastRows), region)

Then make it into a data.frame

m = setNames(stack(region_indices), c("row", "region"))

and then assign with

ordered_tally[m$row, region := as.character(m$region)]

At this point, rows that weren't included in region_indices have NA for region. If you're sure that those are all Europe, then use

ordered_tally[is.na(region), region := "Europe"]

Comment. Generally, it's unsafe to have your data stored via row number. (I mean, you're liable to create trouble for yourself down the road.) It would be much better of you had a regions list by country name.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    I agree with your point. That's the next thing on the to-do list, now that I typed out the row numbers I won't have to type out the country names but can generate the list. – sunny Oct 21 '15 at 16:31