Pandas dataframe conditional substitution and columnwise trimming

Question

fn1 = pd.DataFrame([['A', 'NaN', 'NaN', 9, 6], ['B', 'NaN', 2, 'NaN', 7], ['C', 3, 2, 'NaN', 10], ['D', 'NaN', 7, 'NaN', 'NaN'], ['E', 'NaN', 'NaN', 3, 3], ['F', 'NaN', 'NaN', 7,'NaN']], columns = ['Symbol', 'Condition1','Condition2', 'Condition3', 'Condition4'])

fn1.set_index('Symbol', inplace=True)



         Condition1 Condition2 Condition3 Condition4
Symbol                                            
A             NaN        NaN          9          6
B             NaN          2        NaN          7
C               3          2        NaN         10
D             NaN          7        NaN        NaN
E             NaN        NaN          3          3
F             NaN        NaN          7        NaN

I'm currently working with a Pandas DataFrame that looks like the link above. I'm trying to go column by column to substitute values that are not 'NaN' with the 'Symbol' associated with that row then collapse each column (or write to a new DataFrame) so that each column is a list of 'Symbol's that were present for each 'Condition' as shown in the desired output:

Desired Output

I've been able to get the 'Symbols' that were present for each condition into a list of lists (see below) but want to maintain the same column names and had trouble adding them to an ever-growing new DataFrame because the lengths are variable and I'm looping through columns.

ls2 = []
for col in fn1.columns:
    fn2 = fn1[fn1[col] > 0]
    ls2.append(list(fn2.index))

Where fn1 is the DataFrame that looks like the first image and I had made the 'Symbol' column the index.

Thank you in advance for any help.

Hello, please see [how to create a good reproducible pandas example](https://stackoverflow.com/a/20159305/4333359), that way you're more likely to get your question answered and avoid down votes. It's more tedious to reproduce your data from images. — ALollz, Jan 18 '19 at 20:45

Joe Patten · Answer 1 · 2019-01-18T22:32:16.753

You can map the symbols to each of the columns, and then take the set of non-null values.

df = fn1.apply(lambda x: x.map(fn1['Symbol'].to_dict()))
condition_symbols =  {col:sorted(list(set(fn1_symbols[col].dropna()))) for col in fn1.columns[1:]}

This will give you a dictionary:

{'Condition1': ['B', 'D'],
 'Condition2': ['C', 'H'],
 'Condition3': ['D', 'H', 'J'],
 'Condition4': ['D', 'G', 'H', 'K']}

I know you asked for a Dataframe, but since the length for each list is different, it would not make sense to make it into a Dataframe. If you wanted a Dataframe, then you could just run this code:

pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in condition_symbols.items() ]))

This gives you the following output:

            Condition1  Condition2  Condition3  Condition4
0           B           C           D           D
1           D           H           H           G
2           NaN         NaN         J           H
3           NaN         NaN         NaN         K

score 0 · Accepted Answer · answered Jan 18 '19 at 22:22

Another answer would be slicing, just like below (explanations in comments):

import numpy as np
import pandas as pd

df = pd.DataFrame.from_dict({
    "Symbol": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"],
    "Condition1": [1, np.nan, 3, np.nan, np.nan, np.nan, 7, np.nan, np.nan, 8, 12],
    "Condition2": [np.nan, 2, 2, 7, np.nan, np.nan, 5, 11, 14, np.nan, np.nan],
    }
)


new_df = pd.concat(
    [
        df["Symbol"][df[column].notnull()].reset_index(drop=True) # get columns without null and ignore the index (as your output suggests)
        for column in list(df)[1:] # Iterate over all columns except "Symbols"
    ],
    axis=1, # Column-wise concatenation
)
# Rename columns
new_df.columns = list(df)[1:]
# You can leave NaNs or replace them with empty string, your choice
new_df.fillna("", inplace=True)

Output of this operation will be:

  Condition1 Condition2
0          a          b
1          c          c
2          g          d
3          j          g
4          k          h
5                     i

If you need any further clarification, post a comment down below.

Pandas dataframe conditional substitution and columnwise trimming

2 Answers2