I'm new to Spark, trying to use it like I have used Pandas for data analysis.
In pandas, to see a variable, I will write the following:
import pandas as pd
df = pd.DataFrame({a:[1,2,3],b:[4,5,6]})
print(df.head())
In Spark, my print statements are not printed to the terminal. Based on David's comment on this answer, print statements are sent to stdout/stderr
, and there is a way to get it with Yarn, but he doesn't say how. I can't find anything that makes sense by Googling "how to capture stdout spark".
What I want is a way to see bits of my data to troubleshoot my data analysis. "Did adding that column work?" That sort of thing. I'd also welcome new ways to troubleshoot that are better for huge datasets.