I'm trying to extract a field from a json that contains a list then append that list to a dataframe, but I'm running in to a few different errors.
I think I can write it to a csv then read the csv with Pandas, but I'm trying to avoid writing any files. I know that I can also use StringIO to make a csv, but that has issues with null bytes. Replacing those would be (i think) another line-by-line step that will further extend the time the script takes to complete... i'm running this against a query that returns thens of thousands of results so keeping it fast and simple is a priority
First I tried this:
hit_json = json.loads(hit)
for ln in hit_json.get('hits').get('hits'):
df = df.append(ln['_source'], ignore_index=True)
print(df)
This gives me a result that looks like this:
1 2 3 4
a b d,e,f... x
Then I tried this:
df = df.append(ln['_source']['payload'], ignore_index=True)
But that gives me this error:
TypeError: cannot concatenate object of type "<class 'str'>"; only pd.Series,
pd.DataFrame, and pd.Panel (deprecated) objs are valid
What I'm looking for would be something like this:
0 1 2 3 4
d e f g h
On top of this... I need to figure out a way to handle a specific string in this list that contains a comma... which may be a headache that's best handled in a different question... something like:
# Obviously this is incorrect but I think you get the idea :)
str.replace(',', '^')
except if ',' followed by ' '
Greatly appreciate any help!
EDITING TO ADD JSON AS REQUESTED
{
"_index": "sanitized",
"_type": "sanitized",
"_id": "sanitized".,
"_score": sanitized,
"_source": {
"sanitized": sanitized,
"sanitized": "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,\"34,35\",36,37,38,39,40",
"sanitized": "sanitized",
"sanitized": ["sanitized"],
"sanitized": "sanitized",
"sanitized": "sanitized",
"sanitized": "sanitized",
"sanitized": "sanitized",
}
}]
}
}