0

I have list of company codes(say, this dataframe is called b), which has about 130 companies. In another dataframe(this dataframe is called bb), each column is a company's performance. The name for a centain company's column is just that company's code.

I want to extract each company's performance and put these information into a separate dataframe. I am thinking about creating a foreach to refer each company code in b, then do the extraction.

Following is the example for b and bb

>b
12572 37540 25658 99999 19682 10111 15024 19232 24953 23779 14834 25143 23035 16578 19933 18988 14168 18325 38148 17558 36560 ...
>bb
year ZIP1 month COMPANY_CD marketshare%
2000 10003 1    12572      10
2000 10003 1    37540      20
2000 10003 1    25658      10
2000 10003 1    19682      10
...
2000 10005 1    12572      8
2000 10005 1    23779      8
2000 10005 1    14834      8
...
2001 10003 1    12572      20
2001 10003 1    17558      20
2001 10003 1    25143      20

a is the target dataframe for each company.

>a
year ZIP1 month COMPANY_CD marketshare%
2000 10003 1    12572      10
2000 10005 1    12572      8
...
2001 10003 1    12572      20

Pseudo code is:

a <- bb %>% filter(COMPANY_CD == "12572")%>% select(year,month,`marketshare%`) %>% arrange(year,month,ZIP1)
names(a) <- c("year","month","ZIP1","12572")
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Jian Zhang
  • 15
  • 3

2 Answers2

0

For-loops are rarely the best way to approach a problem in R. Since the data are in data frame formats already, you likely want a left outer join of b -> bb. A related SO answer can be found here.

In short, I believe you are looking for something like the following assuming that the column in b is also named COMPANY_CD.

a <- merge(x = b, y = bb, by = "COMPANY_CD", all.x = TRUE)
bdempe
  • 308
  • 2
  • 9
0

If you have a frame and want to split it up by the company id, then you don't need anything fancy.

split(df, df$COMPANY_CD)
# $`12572`
#   year  ZIP1 month COMPANY_CD marketshare.
# 1 2000 10003     1      12572           10
# 5 2000 10005     1      12572            8
# 8 2001 10003     1      12572           20
# $`14834`
#   year  ZIP1 month COMPANY_CD marketshare.
# 7 2000 10005     1      14834            8
# $`17558`
#   year  ZIP1 month COMPANY_CD marketshare.
# 9 2001 10003     1      17558           20
# $`19682`
#   year  ZIP1 month COMPANY_CD marketshare.
# 4 2000 10003     1      19682           10
# $`23779`
#   year  ZIP1 month COMPANY_CD marketshare.
# 6 2000 10005     1      23779            8
# $`25143`
#    year  ZIP1 month COMPANY_CD marketshare.
# 10 2001 10003     1      25143           20
# $`25658`
#   year  ZIP1 month COMPANY_CD marketshare.
# 3 2000 10003     1      25658           10
# $`37540`
#   year  ZIP1 month COMPANY_CD marketshare.
# 2 2000 10003     1      37540           20

If you only want this for a subset of the companies, then

dftmp <- df[ df$COMPANY_CD %in% c(12572, 14834, 17558, 99999), ]
split(dftmp, dftmp$COMPANY_CD)
# $`12572`
#   year  ZIP1 month COMPANY_CD marketshare.
# 1 2000 10003     1      12572           10
# 5 2000 10005     1      12572            8
# 8 2001 10003     1      12572           20
# $`14834`
#   year  ZIP1 month COMPANY_CD marketshare.
# 7 2000 10005     1      14834            8
# $`17558`
#   year  ZIP1 month COMPANY_CD marketshare.
# 9 2001 10003     1      17558           20

Side note: I recommend against trying to put each company into separate variables. Since they are all structured the same, and I think it highly likely that what you do to one company will be done similarly to all of the other companies, it is usually best to deal with a list of frames instead of individual frames. See https://stackoverflow.com/a/24376207/3358227 for some good discussion on that topic.


Data:

df <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L), ZIP1 = c(10003L, 10003L, 10003L, 10003L, 10005L, 10005L, 10005L, 10003L, 10003L, 10003L), month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), COMPANY_CD = c(12572L, 37540L, 25658L, 19682L, 12572L, 23779L, 14834L, 12572L, 17558L, 25143L), marketshare. = c(10L, 20L, 10L, 10L, 8L, 8L, 8L, 20L, 20L, 20L)), class = "data.frame", row.names = c(NA, -10L))
r2evans
  • 141,215
  • 6
  • 77
  • 149