0

I have a *.csv file which I would like to read into R and alter its form so as to allow working with the data.

The general syntax of the file is as follows. It should be noticed that the file consists of several blocks, each with several thousand rows and about 50 columns each (each block has the same number of columns). The titles "Block One/Two/Three...Twenty" do not exist in the original file but were inserted for the sake of clarification. Please follow the link underneath for a simplified structure of the data.

Simplified data structure.

The problem I am having is that the format won't allow working with the data. Has anyone had a similar problem and a solution to get all the individual blocks into individual dataframes, for example?

The full file can be accessed here: https://www.dropbox.com/s/aal3ypgh3h82t5h/Data.csv?dl=0

Thanks in advance for any forthcoming help, it would be much appreciated!

c.

cthulhukk
  • 99
  • 1
  • 7
  • 1
    It's better to include a simplified [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the question itself rather than linking to external files. Also be sure to clearly provide the desired output for the given input. – MrFlick Jul 07 '17 at 21:25
  • Thank you for your feedback, MrFlick. I will mind your tip for future questions! – cthulhukk Jul 07 '17 at 21:33
  • How do we know where each block starts and ends? If you know the start and end rows, you can just slice up the full data frame after you `read.csv()`. Since there doesn't appear to be a character delimiter between the blocks, why not just read in the whole thing, and then split by row indices, or date/factor filters (whatever defines each block)? – Mako212 Jul 07 '17 at 22:44
  • Thanks for your response @HenryRice! And sorry, should have been clearer: Unfortunately, we don't know when each block begins and ends (the number of rows will be different each time I will have to read the data), so the only clue might be the empty rows between the rows... – cthulhukk Jul 07 '17 at 22:49
  • @cthulhukk Let me rephrase the question: what's different between each block? We can only split by a consistent rule, or set of rules, and blank lines seem inconsistent when I read through the data. Why can't you export the data as separate files for each block? – Mako212 Jul 07 '17 at 22:59
  • Sorry for the late reply. The difference between each block lies in the heading in row 3 of each block, thus "No. of Issues", "Maturity", "Years to Workout", etc. The dates will be the same and also the heading rows 1 and 2. – cthulhukk Jul 09 '17 at 20:24

0 Answers0