How to automate script which reads from CSV file with multiple sections?

Question

I have data in a text file that consists of both integer and non-integer rows,exmaple data shown as below:

2012    10  6   3   57  22.3    33.18   73.71   0
KP  EP  3   57  36.36               
CET ES  3   58  17.22               
CET EP  3   57  55.12               
DHL ES  3   58  3.62                
2015    10  4   7   42  58  33.17   73.74   1.1
PDA EP  7   43  8.6             
NPR ES  7   43  27.98               
PAL EP  7   43  20.93               
CET ES  7   43  52.31               
CET EP  7   43  30.46               
2009    10  4   11  19  4.6 33.16   73.71   0
CET ES  11  19  59.6                
CET EP  11  19  37.12               
THW EP  11  19  37

In the above case, my data has 3 sets, each set has one non-integer row and a variable number (2, 6, 10, 15, 40, etc) of non-integer rows. I need to write code that concat the first set of integer and non-integer rows and save it in a text file. Then, it takes the second set and repeats the same process, and saves it again in the same text file followed by the first set, and so on.

(For clarity, I have modified my question in a more generic way so that everyone can easily understand).

Would you be so kind as to delete non-relevant code from your question? E.g. the `figures` part, the plotting, etc. They all seem irrelevant to your question and make it hard to read. — Jan, Jun 14 '21 at 07:34
This code is a mess. Try dividing it to sub-routines with coherent names that each solve a single issue in a general way. Create a function that uses these sub-routines to solve a general case, when you have that you simply need to divide your data to general cases and use that function to get what you want. hope this helps — Almog-at-Nailo, Jun 14 '21 at 07:45
Ok you need to say your input file format has multiple **sections**, essentilly separate CSV files. Each section starts with a header row with integers (5 integers and 3 floats), followed by data rows that start with alphabetical. And you want to write reader code to output each section separately. You'll want a `csv_section_reader` generator which yields each sectio separately, then pushes back the header row of the next section onto the input. (Alternatively, you could just use `pd.read_csv(nrows, skiprows)` to say which lines to read.) Could you please clean up your code a little? — smci, Jun 14 '21 at 08:14

smci · Answer 1 · 2021-06-14T10:37:37.443

You want to read an input file which has multiple sections, each of which is a separate CSV file. Here are two ways; if you're a beginner coder, semiautomatically implementing a) is going to be far easier than automating b).

a) Easiest: simply get each section's start row and length, then pass them into `pd.read_csv(..., nrows=..., skiprows=...)`

See this answer

You can find out each section's start row with:

`egrep -n '^[0-9]' the.csv`
1:2012    10  6   3   57  22.3    33.18   73.71   0
6:2015    10  4   7   42  58  33.17   73.74   1.1
12:2009    10  4   11  19  4.6 33.16   73.71   0

Note that egrep counts starting from 1, but Python from 0, so your start rows will be [0,5,11]

>>> pd.read_csv('the.csv', skiprows=0, nrows=4)
  2012    10  6   3   57  22.3    33.18   73.71   0
0              KP  EP  3   57  36.36               
1              CET ES  3   58  17.22               
2              CET EP  3   57  55.12               
3              DHL ES  3   58  3.62                

>>> pd.read_csv('the.csv', skiprows=5, nrows=5)
  2015    10  4   7   42  58  33.17   73.74   1.1
0                PDA EP  7   43  8.6             
1            NPR ES  7   43  27.98               
2            PAL EP  7   43  20.93               
3            CET ES  7   43  52.31               
4            CET EP  7   43  30.46               

>>> pd.read_csv('the.csv', skiprows=11, nrows=9999)
  2009    10  4   11  19  4.6 33.16   73.71   0
0          CET ES  11  19  59.6                
1          CET EP  11  19  37.12               
2                         THW EP  11  19  37

b) Harder: write a Python generator `multiple_csv_section_reader()` which:

Reads each CSV section separately from the input, and yields it as a chunk. You can then pass this chunk into pd.read_csv(StringIO(...))
Detects the start of a new CSV section when it sees an header row with integers (5 integers and 3 floats)...
then pushes back the header row of the next section onto the input.

Also, move all your plotting code into a separate function, we don't need to see it.

How to automate script which reads from CSV file with multiple sections?

1 Answers1

a) Easiest: simply get each section's start row and length, then pass them into pd.read_csv(..., nrows=..., skiprows=...)

b) Harder: write a Python generator multiple_csv_section_reader() which:

a) Easiest: simply get each section's start row and length, then pass them into `pd.read_csv(..., nrows=..., skiprows=...)`

b) Harder: write a Python generator `multiple_csv_section_reader()` which: