0

I want to update specific rows in a CSV file that has dates with a data frame that I created in R.

01/04/20, Asset, Position, Price, Mark-to-Market
0, PORTFOLIO, NA, NA, 1000000
1, CASH, NA, NA, 1000000

02/04/20, Asset, Position, Price, Mark-to-Market, Position prior, Transaction, TC spent
0, PORTFOLIO, NA, NA, 999231, NA, NA, NA
1, CASH, NA, NA, 509866, NA, NA, NA
2, FUTURES, 500, 2516, 1258250, 0, 500, 629
3, VXc1, -5931, 47, -279795, 0, -5931, 140
 , Total, Buys:, 1, Sells:, 1, TC spent:, 769

There are approximately 1000+ rows.

However, I am unable to read this CSV file using the following codes. Can anyone help me with this?

df4 <- read.csv("filename.csv")

Further, I have to add two columns (2 and 3) from df3 mentioned below in the rows of df4 that have dates (except the first row). Can anyone help me with this as well?

The code to get df3 is as follows. However, I don't know how to add the rows to df4 selectively in R.

df1 <- read.csv("filename1.csv")
df2 <- read.csv("filename2.csv")
df3 <- cbind(df2[,c(1)], df1[,c(3)], df2[,c(3)])
Abhik Paul
  • 39
  • 1
  • 9
  • 2
    "However, I am unable to read this CSV file using the read.csv command. Can anyone help me with this?" How can we, who know neither the structure of the file, nor the code being used to read the file, nor the errors that you are getting when you run the code, possibly be able to answer? Please provide a [mcve]. See [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/4996248) for what this would mean in R. – John Coleman Feb 20 '21 at 15:01
  • Usually, the StackExchange suggestions for including sample data in a question presume that you already have the data in R, so methods like `dput(.)` make sense. However, in this case, I suggest you copy the raw CSV file contents (not from Excel or similar, the raw contents with commas and all), then [edit] your question and paste into a [code block](https://stackoverflow.com/editing-help). And then (as already suggested) include the code you attempted to use to read in that file. *That* should make this question a lot more reproducible. Good luck! – r2evans Feb 20 '21 at 15:07
  • 1
    Apologies for such crude questions. Thank you @G.Grothendieck and others for your comments. Have edited my question accordingly. Please let me know if this helps – Abhik Paul Feb 20 '21 at 15:35
  • That's a huge improvement, AbhikPaul, thank you. – r2evans Feb 20 '21 at 15:36

1 Answers1

1

I'm not sure what you need for your second question, but to address the first:

txt <- readLines("filename.csv")
# Warning in readLines("filename.csv") :
#   incomplete final line found on 'filename.csv'
multidf <- by(txt, cumsum(!grepl("\\S", txt)),
              FUN = function(x) read.csv(text = x, strip.white = TRUE))
multidf
# cumsum(!grepl("\\S", txt)): 0
#   X01.04.20     Asset Position Price Mark.to.Market
# 1         0 PORTFOLIO       NA    NA        1000000
# 2         1      CASH       NA    NA        1000000
# --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
# cumsum(!grepl("\\S", txt)): 1
#   X02.04.20     Asset Position Price Mark.to.Market Position.prior Transaction TC.spent
# 1         0 PORTFOLIO     <NA>    NA         999231             NA        <NA>       NA
# 2         1      CASH     <NA>    NA         509866             NA        <NA>       NA
# 3         2   FUTURES      500  2516        1258250              0         500      629
# 4         3      VXc1    -5931    47        -279795              0       -5931      140
# 5        NA     Total    Buys:     1         Sells:              1   TC spent:      769

The multidf object is technically a "by"-class object, but that's really just a glorified list:

str(multidf)
# List of 2
#  $ 0:'data.frame':    2 obs. of  5 variables:
#   ..$ X01.04.20     : int [1:2] 0 1
#   ..$ Asset         : chr [1:2] "PORTFOLIO" "CASH"
#   ..$ Position      : logi [1:2] NA NA
#   ..$ Price         : logi [1:2] NA NA
#   ..$ Mark.to.Market: int [1:2] 1000000 1000000
#  $ 1:'data.frame':    5 obs. of  8 variables:
#   ..$ X02.04.20     : int [1:5] 0 1 2 3 NA
#   ..$ Asset         : chr [1:5] "PORTFOLIO" "CASH" "FUTURES" "VXc1" ...
#   ..$ Position      : chr [1:5] NA NA "500" "-5931" ...
#   ..$ Price         : int [1:5] NA NA 2516 47 1
#   ..$ Mark.to.Market: chr [1:5] "999231" "509866" "1258250" "-279795" ...
#   ..$ Position.prior: int [1:5] NA NA 0 0 1
#   ..$ Transaction   : chr [1:5] NA NA "500" "-5931" ...
#   ..$ TC.spent      : int [1:5] NA NA 629 140 769

From here, you can keep it as a list (can be good, see https://stackoverflow.com/a/24376207/3358227) or try to combine into a single frame (the same link has info for that, too).

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • can you help me with the next step. Here is the exact requirement https://stackoverflow.com/q/66294884/13455074 – Abhik Paul Feb 21 '21 at 02:06