1

I'm quite new to R, and we have to do a project about covid-19. I have downloaded a fairly big csv file which contains >300.000 lines of data about the country, the region, the city, and how much have certain activities decreased or increased from baseline there. This is an example of the structure of my data frame:

country_code       country_region sub_region_1 sub_region_2       date retail_rec groc_phar park transit work res
1            AE United Arab Emirates                           2020-02-15          0         4    5       0    2   1
2            AE United Arab Emirates                           2020-02-16          1         4    4       1    2   1
3            AE United Arab Emirates                           2020-02-17         -1         1    5       1    2   1
4            AE United Arab Emirates                           2020-02-18         -2         1    5       0    2   1
5            AE United Arab Emirates                           2020-02-19         -2         0    4      -1    2   1
6            AE United Arab Emirates                           2020-02-20         -2         1    6       1    1   1
7            AE United Arab Emirates                           2020-02-21         -3         2    6       0   -1   1
8            AE United Arab Emirates                           2020-02-22         -2         2    4      -2    3   1
9            AE United Arab Emirates                           2020-02-23         -1         3    3      -1    4   1
10           AE United Arab Emirates                           2020-02-24         -3         0    5      -1    3   1
11           AE United Arab Emirates                           2020-02-25         -3         2    3      -2    3   1
12           AE United Arab Emirates                           2020-02-26         -2         1   -3      -2    3   1
13           AE United Arab Emirates                           2020-02-27          1         5   -1      -1    3   1
14           AE United Arab Emirates                           2020-02-28          1         5   -1      -1    1   1
15           AE United Arab Emirates                           2020-02-29          2         7   -1      -1    5   0
16           AE United Arab Emirates                           2020-03-01          3        10    2      -1    4   1
17           AE United Arab Emirates                           2020-03-02          0         7    1      -2    4   1
18           AE United Arab Emirates                           2020-03-03          0         6    0      -5    4   1
19           AE United Arab Emirates                           2020-03-04         -1         7   -2      -5    3   2
20           AE United Arab Emirates                           2020-03-05         -3         6   -2      -5    3   2
21           AE United Arab Emirates                           2020-03-06         -7         5   -8      -9    0   3
22           AE United Arab Emirates                           2020-03-07         -3         6    1      -8    4   2
23           AE United Arab Emirates                           2020-03-08          1         8    6      -9   -1   3
24           AE United Arab Emirates                           2020-03-09         -3         4    4     -10   -1   4
25           AE United Arab Emirates                           2020-03-10         -4         6    3     -11   -2   4
26           AE United Arab Emirates                           2020-03-11         -4         5    0     -12   -2   4
27           AE United Arab Emirates                           2020-03-12         -8         6   -6     -15   -3   5            

How could I make it so I create a new data frame than contains only the data from the 10 countries that I need?

CAAlonso
  • 11
  • 1

1 Answers1

2

Using package dplyr and assuming your dataframe is named df:

ten_countries <- c("United Arab Emirates", "Xanadu", "Otherland", "Neverland")

df %>%
  filter(country_region %in% ten_countries)

If country_region isn't the right column take the right one. ;-)

Martin Gal
  • 16,640
  • 5
  • 21
  • 39