4

I have a dataframe with two rows

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

With 8 nulls, it looks like this:

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

What I want

Replace NaN values in sequences columns (seq_col, seq_col_2, seq_col_3 etc) with a list of my own.

Note: .

  • In this data there are 2 sequence column only but could be many more.
  • Cannot replace previous lists already in the columns, ONLY NaNs

I could not find solutions that replaces NaN with a user provided list value from a dictionary suppose.

Pseudo Code:

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

I tried this code below, it works for the first column you pass then for the second column it doesn't. Which is weird.

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

The above works but then try again on seq_col_2, it will give weird results.

Expected Output: Given param input:

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
  • 1
    Can you show the expected output? Also, what results do you get with your code? – harvpan Jul 13 '18 at 15:26
  • 1
    Nice, finally someone who posted at least an executable code example! Unluckily I can't help you, but I'll upvote your question therefore. But as Harv mentioned: An expected output would help alot. – JE_Muc Jul 13 '18 at 15:28
  • Do you basically want to convert the 10 values in those 2 lists into 10 individual values for each row in those columns? If so, what would you want to do for the columns without lists? – ALollz Jul 13 '18 at 15:38
  • link may help https://stackoverflow.com/questions/48197234/explode-stack-a-series-of-strings/48197300#48197300 – BENY Jul 13 '18 at 15:41
  • Is this what you're looking for? https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.fillna.html – xyzjayne Jul 13 '18 at 15:41
  • @HarvIpan: Added expected output. – annonymous_guy Jul 13 '18 at 15:49
  • @ALollz: I will replace the other NaNs which doesn't have lists with 0 or sth later. Lets not worry about that for now – annonymous_guy Jul 13 '18 at 15:51

1 Answers1

3

With input arrays, you can use pd.DataFrame.loc with pd.Series.isnull:

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

If you need list values in your series, then you can convert to a series explicitly before assignment:

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))
jpp
  • 159,742
  • 34
  • 281
  • 339