0

I am new to both Python and Stack Overflow.

I extract from a csv file a few columns into an interim csv file and clean up the data to remove the nan entries. Once I have extracted them, I endup with below two csv files.

Main CSV File:

Sort,Parent 1,Parent 2,Parent 3,Parent 4,Parent 5,Name,Parent 6

1,John,,,Ned,,Dave

2,Sam,Mike,,,,Ken

3,,,Pete,,,Steve

4,,Kerry,,Rachel,,Rog

5,,,Laura,Mitchell,,Kim

Extracted CSV:

Name,ParentNum

Dave,Parent 4

Ken,Parent 2

Steve,Parent 3

Rog,Parent 4

Kim,Parent 4

What I am trying to accomplish is that I would like to recurse through main csv using the name and parent number. But, if I write a for loop it prints empty rows because it is looking up every row for the first value. What is the best approach instead of for loop. I tried dictionary reader to read scv but could not get far. Any help will be appreciated.

CODE:

import xlrd
import csv
import pandas as pd

print('Opening and Reading the msl sheet from the xlsx file')
with xlrd.open_workbook('msl.xlsx') as wb:
    sh = wb.sheet_by_index(2)
    print("The sheet name is :", sh.name)
    with open(msl.csv, 'w', newline="") as f:
        c = csv.writer(f)
        print('Writing to the CSV file')
        for r in range(sh.nrows):
            c.writerow(sh.row_values(r))

df1 = pd.read_csv(msl.csv, index_col='Sort')
with open('dirty-processing.csv', 'w', newline="") as tbl_writer1:
    c2 = csv.writer(tbl_writer1)
    c2.writerow(['Name','Parent'])
    for list_item in first_row:
        for item in df1[list_item].unique():
            row_content = [item, list_item]
            c2.writerow(row_content)

Expected Result: Input Main CSV:

enter image description here

In the above CSV, I would like to grab unique values from each column into a separate file or any other data type. Then also capture the header of the column they are taken from.

Ex: Negarnaviricota,Phylum Haploviricotina,Subphylum ... so on

Next thing is would like to do is get its parent. Which is where I am stuck. Also, as you can see not all columns have data, so I want to get the last non-blank column. Up to this point everything is accomplished using the above code. So the sample output should look like below.

enter image description here

stovfl
  • 14,998
  • 7
  • 24
  • 51
Mo Ziauddin
  • 380
  • 1
  • 13
  • Would really help if you could give an expected input and output on what you want to find recursively. – crazyGamer Jan 27 '19 at 02:45
  • Hi crazyGamer, I have added the sample input file and an example output... Sorry if the requirement is vague. I am trying to build a parent child for each item in the CSV, regardless of what level it is in hierarchy. – Mo Ziauddin Jan 27 '19 at 05:03
  • Why is Haploviricotina the parent of itself? Could you explain how this was calculated? – crazyGamer Jan 27 '19 at 06:09
  • hi @crazyGamer... Sorry about the confusion. That was a error on my part. I have updated the main CSV image and sample output again. They are formed in following way. 1. Grab unique names from each column along with the heading of that column. 2. Get the parent of each name using the relationship in the main csv sheet. 3. Identifying parent is simple. The parent of Beihai yingvirus is, yingvirus(genus) whose parent is Qinviridae and so on – Mo Ziauddin Jan 27 '19 at 09:35
  • @MoZiauddin: `pandas` can handle `xlsx` files. Read [read-excel-in-pandas](https://stackoverflow.com/questions/26474693/excelfile-vs-read-excel-in-pandas) – stovfl Jan 27 '19 at 10:26
  • @stovfl Thanks. I am using `pandas`. Are you saying that i can accomplish this using `parse("Sheet1")` rather than `csv`? – Mo Ziauddin Jan 27 '19 at 19:51
  • use `df = pd.read_excel(xlsx, sheetname)` and see what you get. – stovfl Jan 27 '19 at 19:54
  • @stovfl - I can read the data into a data frame and grab the unique values from each column like I was able to do it in csv. But I need to be able to extract the parents... – Mo Ziauddin Jan 27 '19 at 20:23
  • @MoZiauddin: Do [remove-null-columns-in-a-dataframe-pandas](https://stackoverflow.com/questions/10857924/remove-null-columns-in-a-dataframe-pandas) then [transpose-and-widen-data](https://stackoverflow.com/questions/49423787/transpose-and-widen-data) – stovfl Jan 27 '19 at 22:26
  • @stovfl I am able to transpose but what I am unable to do is reference across multiple columns in same row – Mo Ziauddin Jan 30 '19 at 04:24
  • @MoZiauddin: Open a **new** Questions with *unable to reference across multiple columns*. Don't forget to **tag** with `pandas` and provide your example date **NOT AS IMAGE** means, show a [mcve]. – stovfl Jan 30 '19 at 09:07

0 Answers0