I have a series of csv files that I am trying to analyse. The files have this structure.
"","x"
"1","<HTML>"
"2","<TITLE>Radiosonde Data</TITLE>"
"3","<LINK REL=""StyleSheet"" HREF=""/resources/select.css"" TYPE=""text/css"">"
"4","<BODY BGCOLOR=""white"">"
"5","<H2>15420 LRBS Bucuresti Observations at 00Z 01 Jan 1980</H2>"
"6","<PRE>"
"7","-----------------------------------------------------------------------------"
"8"," PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV"
"9"," hPa m C C % g/kg deg knot K K K "
"10","-----------------------------------------------------------------------------"
"11"," 1000.0 -55 "
"12"," 981.0 91 1.4 1.4 100 4.34 45 26 276.1 288.0 276.8"
"13"," 850.0 1257 3.6 3.6 100 5.86 155 28 289.9 306.9 290.9"
"14"," 700.0 2809 -5.1 -5.1 100 3.76 170 44 296.8 308.2 297.5"
"15"," 500.0 5350 -24.9 -25.7 93 0.95 180 36 302.6 305.8 302.8"
"16"," 400.0 6940 -36.0 -36.9 91 0.41 195 36 308.1 309.6 308.2"
"17"," 300.0 8870 -51.5 -52.8 86 0.10 200 46 312.6 313.0 312.7"
"18"," 250.0 10050 -60.3 -61.7 83 0.04 205 66 316.3 316.4 316.3"
"19"," 238.0 10345 -61.5 -62.9 83 0.03 200 80 319.0 319.1 319.0"
"20"," 200.0 11390 -65.7 -67.3 80 0.02 215 60 328.6 328.7 328.6"
"21"," 150.0 13170 -60.3 -61.9 81 0.06 225 56 366.0 366.3 366.0"
"22"," 100.0 15680 -62.1 -63.8 80 0.07 235 48 407.5 407.9 407.5""
"23","</PRE><H3>Station info</H3><PRE>"
"24"," Station ID: LRBS"
The aim is to extract the embedded tables (the data between lines 11-22 in the above example) as separate data frames with 11 columns. So far have I have written a script that reads all the text lines (using readLines) and then identifies the start and the end of each table. But in this way the data manipulation is not very easy. I am wondering what will be a more efficient way of extracting the tables.
Thank you very much for all your help.