1

Given a json string of records where the "schema" for each record is not consistent (e.g. each record does not have the full set of "columns"):

s = """[{"a": 3, "b":[]}, {"a": 4, "b": [4]}, {"a": 5}]"""

A pandas DataFrame can be constructed from this string:

import pandas as pd
import json

json_df = pd.DataFrame.from_records(json.loads(s))

Which results in

   a    b
0  3   []
1  4  [4]
2  5  NaN

How can all NaN instances of a pandas Series column be filled with empty list values? The expected resulting DataFrame would be:

   a    b
0  3   []
1  4  [4]
2  5   []

I have tried the following; none of which worked:

json_df[json_df.b.isna()] = [[]]*json_df[json_df.b.isna()].shape[0]

from itertools import repeat
json_df[json_df.b.isna()] = repeat([], json_df[json_df.b.isna()].shape[0])

import numpy as np
json_df[json_df.b.isna()] = np.repeat([], json_df[json_df.b.isna()].shape[0])

Thank you in advance for your consideration and response.

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125

1 Answers1

0

first find the nan and replace by the same shape of data

json_df.loc[json_df.b.isnull(), 'b'] = json_df.loc[json_df.b.isnull(), 'b'].apply(lambda x: [])


    a   b
0   3   []
1   4   [4]
2   4   []
galaxyan
  • 5,944
  • 2
  • 19
  • 43
  • https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – ansev Mar 31 '20 at 19:36
  • simplemente estas asignando al valor de esa fila [] con apply de forma innecesaria – ansev Mar 31 '20 at 19:38