0

Unsure whether the title encompasses everything I am trying to do here.

I am doing an analysis of a client database and as such we have Dataframe 1 in which each row represents a unique client (by a client id).

I then have another dataframe which lists the assets clients have. However, each row represents a unique asset (by an asset id). Therefore, a client's id can appear many times meaning that the two dataframes cannot be merged without creating another variable.

I want to create some variables that represent the portion of a client's investment in a certain asset type, as well as their total assets.

Is there an easy way to do this? Such as a group_by clientid and then group the asset types and mean?

Nalhcal
  • 75
  • 8
  • 2
    We need data to tell. Can you edit the question with the output of `dput(head(df2, 30))`? – Rui Barradas Jul 26 '20 at 02:57
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 26 '20 at 03:08

1 Answers1

1

I have recreated a scenario trying to emulate the issue you are facing to the best of my understanding of your situation. Hopefully, it should at least put you on the path to the answer you are looking for.

You can copy paste the below code in your R console to go through all the steps.

library(dplyr)

######## Create the client database, assuming 4 different asset classes and an asset value of 1 for each of them.
df <- cbind.data.frame(clientId = c(1,1,2,3,3,3,4,4,4,5,5,6,6,7,8,9,9,10,10,10),AssetCategory= rep(c('a','b','c','d'),5),AssetValue =rep(c(1),20))

#Calculating the clients' total assets
totalAssetByClient <- df %>% group_by(clientId) %>% summarize(totalAssetByClient = sum(AssetValue))

# Appending TotalAssetByClient variable to the dataframe (client database) <- Answer to your FIRST question
df2 <- left_join(df,totalAssetByClient,by = "clientId")


#  Then Create an empty dataset to host the AssetShareByClient table
AssetShareByClient <- data.frame(clienId = integer(), AssetCategory = character(), AssetShareByClient = double())

# Creating filling the AssetShareByClient table with a nested for Loop (sorry no easy way)
for (client in unique(df2$clientId))
{
for (asset in unique(df2$AssetCategory))
{    
    df3 <- filter(df2, clientId == client, AssetCategory == asset)
    AssetShareByClient <- rbind(AssetShareByClient, c(client,asset,sum(df3$AssetValue)/mean(df3$totalAssetByClient)))
}
}

# We now have a standalone table with a column showing the proportion of investment per asset for each cient <- Answer to your SECOND question
# When the client has 0% share of an asset category it shows NaN. The sum of asset share category  for each client sums to 100%
names(AssetShareByClient) = c("clientId","AssetCategory","AssetShareByClient")
print(AssetShareByClient) 
babkouew
  • 65
  • 5