I am able to load a csv into pandas dataframe, but it is stuck in a list. How can I load directly into a pandas dataframe from Pydrill or unlist the pandas dataframe columns and data? I've tried unlisting and it puts everything into a list of a list.
I've used the to_dataframe(), but can't seem to find documentation on if I can use a delimeter. pd.dataframe doesn't work because of the Pydrill query.
reviews = drill.query("SELECT * FROM hdfs.datasets.`titanic_ML/titanic.csv` LIMIT 1000", timeout=30)
print(reviews)
import pandas as pd
df2 = reviews.to_dataframe()
df2.rename(columns=df2.iloc[0])
headers = df2.iloc[0]
print(headers)
new_df = pd.DataFrame(df2.values[1:], columns=headers)
new_df.head()
The results cast everything into a list.
["pclass","sex","age","sibsp","parch","fare","embarked","survived"]
0 ["3","1","38.0","0","0","7.8958","1","0"]
1 ["1","1","42.0","0","0","26.55","1","0"]
2 ["3","0","9.0","4","2","31.275","1","0"]
3 ["3","1","27.0","0","0","7.25","1","0"]
4 ["1","1","41.0","0","0","26.55","1","0"]
I'd like to get everything into a normal pandas dataframe.