1

The following code works, but requires 3 passes over dataframe and is very slow. There should be a better way to do this?

df['raw_results'].replace("{}", '{"PhysicalDisks":[{"Status":"NaN","Name":"NaN"}]}', inplace=True)
df['raw_results'].replace('{"error":8004}', '{"PhysicalDisks":[{"Status":"error","Name":"NaN"}]}', inplace=True)
df['raw_results'].replace('{"error":8003}', '{"PhysicalDisks":[{"Status":"error","Name":"NaN"}]}', inplace=True)

Update

This works much faster, but still would be better if errors were processed with something like regex to accommodate for different error codes:

df['raw_results'] = np.where(df.raw_results == '{}', '{"PhysicalDisks":[{"Status":"NaN","Name":"NaN"}]}', df.raw_results)
df['raw_results'] = np.where(df.raw_results == '{"error":8004}', '{"PhysicalDisks":[{"Status":"error","Name":"NaN"}]}', df.raw_results)
df['raw_results'] = np.where(df.raw_results == '{"error":8003}', '{"PhysicalDisks":[{"Status":"error","Name":"NaN"}]}', df.raw_results)

1 Answers1

2

Since strings are hashable, you can use a dictionary:

d = {'{}': '{"PhysicalDisks":[{"Status":"NaN","Name":"NaN"}]}',
     '{"error":8004}': '{"PhysicalDisks":[{"Status":"error","Name":"NaN"}]}',
     '{"error":8003}': '{"PhysicalDisks":[{"Status":"error","Name":"NaN"}]}'}

Then use fillna to replace unmapped elements with your original series:

df['raw_results'] = df['raw_results'].map(d).fillna(df['raw_results'])

Related: Replace values in a pandas series via dictionary efficiently for an explanation why and when pd.Series.map + dict might outperform pd.Series.replace.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks! Although I don't think `fillna` will work in my case, later I'm doing json parsing in this column and it expects it to be in a format of `'{"PhysicalDisks":[{"Status":"something","Name":"something"}]}'`. Also, there might be errors other than 8004 or 8003 in other datasets that will be processed – Daniel Lennart Oct 16 '18 at 10:34
  • @DanielLennart, Why wouldn't `fillna` work? All it's saying is "if it's not in the dictionary, do nothing". Provided your values are hashable, it *should* work. For efficiency, you should mock up some data so we can test performance; otherwise, any answer is just conjecture. – jpp Oct 16 '18 at 10:38
  • because there might be other things in the dataset that are not in dict and I would still want to make them uniform - `'{"PhysicalDisks":[{"Status":"something","Name":"something"}]}'` rather than leaving them as is. – Daniel Lennart Oct 16 '18 at 13:08
  • @DanielLennart, In that case, IMO, a much better idea is to ditch the dataframe and work with a list of dictionaries. – jpp Oct 16 '18 at 13:20