Introduction
I have longitudinal data in wide format, measuring the total value in sales that a company had per year. From this, I want to create a new set of variables--market share--for each year in the data, for each company. The full data set is too large to do this the long, clumsy way, so I tried to do it by running a function on a subset (i.e. the columns marking the sales data for each year), using sapply.
However, the results do not seem to produce 'real' variables, as they show up in printing (head()
) but not in reality (names()
). Is something wrong with my code?
# SAMPLE DATA
agyrw <- structure(list(company = c(28, 128, 22, 72, 62, 65, 132, 89, 46, 105), value.1993 = c(79272, 35850, 2124, 32, 0, 0, 0, 26359, 0, 0), value.1994 = c(103974, 10219, 31432, 0, 0, 0, 3997, 469, 0, 0)), .Names = c("company", "value.1993", "value.1994"), row.names = c(9L, 42L, 1L, 30L, 22L, 28L, 51L, 34L, 20L, 40L), class = "data.frame")
agyrw2 <- agyrw # FOR A LATER COMPARISON
agyrw
# company value.1993 value.1994
# 28 79272 103974
# 128 35850 10219
# 22 2124 31432
# 72 32 0
# 62 0 0
# 65 0 0
# 132 0 3997
# 89 26359 469
# 46 0 0
# 105 0 0
Clumsy Long Way
# SUM TOTAL VALUE BY YEAR
total.1993 <- sum(agyrw$value.1993)
total.1994 <- sum(agyrw$value.1994)
# CALCULATE THE MARKET SHARE FOR EACH IMPORTER, BY YEAR
agyrw$share.1993 <- agyrw$value.1993 / total.1993
agyrw$share.1994 <- agyrw$value.1994 / total.1994
# FORMAT THE MARKET SHARE VARIABLE TO ONLY FOUR DECIMAL PLACES
agyrw$share.1993 <- format(round(agyrw$share.1993, 4), nsmall = 4)
agyrw$share.1994 <- format(round(agyrw$share.1994, 4), nsmall = 4)
# RECONVERT THE MARKET SHARE VARIABLE BACK INTO NUMERIC
agyrw$share.1993 <- as.numeric(agyrw$share.1993)
agyrw$share.1994 <- as.numeric(agyrw$share.1994)
# VIEW
agyrw
# company value.1993 value.1994 share.1993 share.1994
# 28 79272 103974 0.5519 0.6927
# 128 35850 10219 0.2496 0.0681
# 22 2124 31432 0.0148 0.2094
# 72 32 0 0.0002 0.0000
# 62 0 0 0.0000 0.0000
# 65 0 0 0.0000 0.0000
# 132 0 3997 0.0000 0.0266
# 89 26359 469 0.1835 0.0031
# 46 0 0 0.0000 0.0000
# 105 0 0 0.0000 0.0000
Parsimonious Attempt
agyrw2$share <- sapply(agyrw2[,2:3], function(x) {
total <- sum(x)
share <- as.numeric(format(round(x/total, 4), nsmall = 4))
return(share)
}
)
# VIEW
agyrw2
# company value.1993 value.1994 share.value.1993 share.value.1994
# 28 79272 103974 0.5519 0.6927
# 128 35850 10219 0.2496 0.0681
# 22 2124 31432 0.0148 0.2094
# 72 32 0 0.0002 0.0000
# 62 0 0 0.0000 0.0000
# 65 0 0 0.0000 0.0000
# 132 0 3997 0.0000 0.0266
# 89 26359 469 0.1835 0.0031
# 46 0 0 0.0000 0.0000
# 105 0 0 0.0000 0.0000
Problem
Upon initial inspection, everything looks fine. The results of agyrw2
using sapply
on the function look the same as the results of agyrw
created by the clumsy code (save for slightly different column names).
But when I try to call any of the newly created variables in agyrw2
, they seemingly don't exist, despite showing up when printed out. For example, calling on the column names produces only one agyrw2$share
column:
names(agyrw)
#[1] "company" "value.1993" "value.1994" "share.1993" "share.1994"
names(agyrw2)
#[1] "company" "value.1993" "value.1994" "share"
How can I rewrite the function so that it actually produces new columns in the data frame?