-5

I have 2 documents, one with specific numbers and another document with a number reference and the definition of that number delimited by the pipe. However, the first document sometimes contains more than one number, which means that it has more columns than the other document, which has only 2. I have tried merging it with the "on" parameter changing the column names thanks to the "for loop" and saving the merged document again after each iteration. The problem is that it deletes the rows which don't have the specified amount of columns. There are columns as restriction_n1-16.

def merge_res(number, last_iter=None):
    res_n = f"restrict_n{number}"
    res_d = f"restrict_d{number}"
    if number == 1:
        restrict_desc_csv = pd.read_csv(
            RESTRICTION_DESC,
            sep="|",
            delimiter="|",
            header=None,
            names=["restrict_n1", "restrict_d1",],
            dtype=object,
        )
        merge = restrict_csv.merge(restrict_desc_csv, on="restrict_n1")
    else:
        restrict_desc_csv = pd.read_csv(
            RESTRICTION_DESC,
            sep="|",
            delimiter="|",
            header=None,
            names=[res_n, res_d],
            dtype=object,
        )
        merge = last_iter.merge(restrict_desc_csv, on=res_n)
    return merge


last_iter = merge_res(1)
for i in range(2, 15):
    last_iter = merge_res(i, last_iter)

enter image description here

enter image description here

Daniel
  • 391
  • 1
  • 12
  • 2
    Please see [`How to make good reproducible pandas examples`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – sushanth Aug 28 '20 at 12:36
  • I know I could put at least some sort of pseudocode, but I was writing this in a hurry. Anyway, the @Vladimír seems to have resolved my question. – Daniel Aug 28 '20 at 12:43

1 Answers1

0

I am not going to give the whole code as it is not that difficult to write. Just write is as you described in your question, just specify the parameter how (viz pandas.DataFrame.merge). The default is inner which causes the lost rows as it merge only on rows that exists in both dataframes. From your description of the problem, you would need to set how='left'.

Vladimír Kunc
  • 379
  • 1
  • 4