Read csv and output to multiple csv files depending on criteria, nested conditions with more than 20 elements

Question

I have a very large csv file which looks like this:

Column1;Column2
01;BE
02;ED
12;FD
14;DS
03;ED
04;DF

Now I want to read this csv and depending on certain criteria I would like to export it to different multiple csv files.

My code is as follows:

import csv
output_path=r'C:\myfolder\large_file.csv'

with open(os.path.join(os.path.dirname(output_path),"first_subset_total.csv"), "w", encoding="utf-8", newline='') as \
out_01, open(os.path.join(os.path.dirname(output_path),"excluded_first.csv"), "w", encoding="utf-8", newline='') as \
out_02, open(os.path.join(os.path.dirname(output_path),"pure_subset.csv"), "w", encoding="utf-8", newline='') as \
out_03_a, open(os.path.join(os.path.dirname(output_path),"final_subset.csv"), "w", encoding="utf-8", newline='') as \
out_04_b:
    
    cw01 = csv.writer(out_01, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw02 = csv.writer(out_02, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw03_a = csv.writer(out_03_a, delimiter=";", quoting=csv.QUOTE_MINIMAL)
    cw04_b = csv.writer(out_04_b, delimiter=";", quoting=csv.QUOTE_MINIMAL)

    with open(output_path, encoding="utf-8") as in_f:
        cr = csv.reader(in_f, delimiter=";")
        header = next(cr) 
        cw01.writerow(header)
        cw02.writerow(header)
        cw03_a.writerow(header)
        cw04_b.writerow(header)

        for line in cr:
            if (line[0][:2] =="01" and ...): cw01.writerow(line)  
            if (line[0][:2] =="02"): cw02.writerow(line)  
            if (line[0][:2] =="03" and ...): cw03_a.writerow(line)  
            if (line[0][:2] =="04" and ...): cw04_b.writerow(line)

Now my problem is first that I have many if statements and more than 04 files. Also some have subset notations like 04_a and 04_b. So now I do it for 04 files, there are way more than 20. Same number of if statements. So many, that I get an SyntaxError: too many statically nested blocks error, because there are more than 20 nested conditions. My current solution is to put the next conditions into a loop again. Not a good solution. This is inefficient. However, I also doubt my coding readiblity and the way I do it in general. So how can I have all this in a more efficient manner?

score 0 · Answer 1 · answered Jan 11 '23 at 10:37

The problem?

So I am not sure I understand your problem. I would assume that originally you went with some kind of if-else nesting that yielded the syntax error and that the solution you present is your fix but is not as efficient as it could be since the conditions in each if are actually mutually exclusive. Meaning that is the first one is true all the rest is false, yet you still check all of them.

Simple solution

If I understood the problem correctly, then the solution is simple, replace your if's by elif. elif is the contraction of else and if (duh...) and allows you to avoid big nested structures est follow:

# ...
for line in cr:
  if (line[0][:2] =="01" and ...): cw01.writerow(line)  
  elif (line[0][:2] =="02"): cw02.writerow(line)  
  elif (line[0][:2] =="03" and ...): cw03_a.writerow(line)  
  elif (line[0][:2] =="04" and ...): cw04_b.writerow(line)

It is true that this is still harder to read, but align your code nicely and this is already pretty acceptable. Although I will admit this leads to a lot of spaghetti code.

Read csv and output to multiple csv files depending on criteria, nested conditions with more than 20 elements

1 Answers1

The problem?

Simple solution

More complex solution (rework your code structure)