I am used to loading a straight csv with the column headers and a single table into R, I have a large csv file that has the following structure:
+-----------+---------+--------+---------+--------+---------+
| file_name | | | | | |
+-----------+---------+--------+---------+--------+---------+
| table1 | | | | | |
+-----------+---------+--------+---------+--------+---------+
| Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
+-----------+---------+--------+---------+--------+---------+
| 198824 | 198824 | 198824 | 198824 | 198824 | 198824 |
+-----------+---------+--------+---------+--------+---------+
| 123 | 1234 | 1242 | 124 | 1241 | 1232 |
+-----------+---------+--------+---------+--------+---------+
| | | | | | |
+-----------+---------+--------+---------+--------+---------+
| | | | | | |
+-----------+---------+--------+---------+--------+---------+
| file_name | | | | | |
+-----------+---------+--------+---------+--------+---------+
| table2 | | | | | |
+-----------+---------+--------+---------+--------+---------+
| Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
+-----------+---------+--------+---------+--------+---------+
| x | x | x | x | x | x |
+-----------+---------+--------+---------+--------+---------+
| y | y | y | y | y | y |
+-----------+---------+--------+---------+--------+---------+
| z | z | z | z | z | z |
+-----------+---------+--------+---------+--------+---------+
| | | | | | |
+-----------+---------+--------+---------+--------+---------+
| | | | | | |
+-----------+---------+--------+---------+--------+---------+
| file_name | | | | | |
+-----------+---------+--------+---------+--------+---------+
| table3 | | | | | |
+-----------+---------+--------+---------+--------+---------+
| Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
+-----------+---------+--------+---------+--------+---------+
| 532523 | 25235 | 532523 | 25235 | 532523 | 25235 |
+-----------+---------+--------+---------+--------+---------+
| 25332 | 5325235 | 25332 | 5325235 | 25332 | 5325235 |
+-----------+---------+--------+---------+--------+---------+
The data is not completely unstructured in that it follows this pattern:
The first row only has the file name: file_name
The 2nd row has the table: table1, table2, table3, etc.
And the the actual table itself, i.e. 6 columns from var1 to var6 with the data below it.
Then there are 2 empty rows and the next set would start with the file_name repeating itself and then the next table number and the table within it
All the tables within the CSV follows this pattern, but I am having trouble even loading this into R, I get the following when loading it straight using read.csv():
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
Is this possible to load into one dataframe using R and also have the table number become a column, and var1-var6 + table number as the column headers?
i.e.
+--------+---------+--------+---------+--------+---------+--------------+
| Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | table_number |
+--------+---------+--------+---------+--------+---------+--------------+
| 198824 | 198824 | 198824 | 198824 | 198824 | 198824 | table1 |
| 123 | 1234 | 1242 | 124 | 1241 | 1232 | table1 |
| x | x | x | x | x | x | table2 |
| y | y | y | y | y | y | table2 |
| z | z | z | z | z | z | table2 |
| 532523 | 25235 | 532523 | 25235 | 532523 | 25235 | table3 |
| 25332 | 5325235 | 25332 | 5325235 | 25332 | 5325235 | table3 |
+--------+---------+--------+---------+--------+---------+--------------+
Note that the number of rows for each table (table1, table2, etc.) have different number of rows.
The CSV file has around 200 tables total and is over the Excel limit (I think around 9MM rows)
Using Brian's Code, here are the first few lines:
> lines_all
[1] "name,,,,," "table1,,,,," "Var1,Var2,Var3,Var4,Var5,Var6" "321,54312,321,54654,3564,54321"
[5] "45,54,4564,54,87,456" ",,,,," ",,,,," "name,,,,,"
[9] "table2,,,,," "Var1,Var2,Var3,Var4,Var5,Var6" "ssvf,afs,fasf,afsaf,zxvz,zvx" "saf,zvx,zz,z,zxvz,zxvzxv"
[13] "zxvsaf,wr,wrw,afsaf,asf,af" ",,,,," ",,,,," "name,,,,,"
[17] "table3,,,,," "Var1,Var2,Var3,Var4,Var5,Var6" "1,2,3,4,5,6" "7,8,9,10,11,12"
[21] "13,14,15,16,17,18" "19,20,21,22,23,24"