How to show full column content in a Spark Dataframe?

Question

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv")
df.registerTempTable("tasks")
results = sqlContext.sql("select col from tasks");
results.show()

The col seems truncated:

scala> results.show();
+--------------------+
|                 col|
+--------------------+
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:15:...|
|2015-11-06 07:15:...|
|2015-11-16 07:15:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
|2015-11-16 07:21:...|
+--------------------+

How do I show the full content of the column?

score 555 · Accepted Answer · edited Feb 24 '21 at 00:36

555

results.show(20, false) will not truncate. Check the source

20 is the default number of rows displayed when show() is called without any arguments.

edited Feb 24 '21 at 00:36

GreenGiant

4,930
1
46
76

answered Nov 16 '15 at 19:24

TomTom101

6,581
2
20
31

11

Not OP but this is indeed the right answer : Minor correction, boolean should be False, not false. – xv70 Apr 01 '16 at 19:49
120

It would be "False" in python, but "false" in scala/java – drewrobb Oct 07 '16 at 23:11
6

it's false (not False) in spark-shell – Luca Gibelli Jan 12 '18 at 17:22
16

the equivalent for writing to stream in console mode is `dataFrame.writeStream.outputMode("append").format("console").option("truncate", "false").start()` – JMess Apr 04 '19 at 23:54
3

what is so special about 20? Why 20? – Bikash Gyawali Aug 29 '19 at 11:59
2

OP asked how not to truncate the columns, so @TomTom101 gave them the equivalent of `df.show()` (20 rows per default) that does not truncate the columns. I.e. `df.show(20, false)` – Noan Cloarec Jan 28 '22 at 17:01
1

Use dataframe_name.show(truncate = False) – Palash Mondal Mar 31 '22 at 05:55
FYI: there is another interesting option for show which is to show "vertically" -- the third optional argument: n=20, truncate=True, vertical=False. It's sometimes easier to read the data in this format. – Jim Flood Dec 06 '22 at 15:29
any alternative for %%sql spark magic for this ? – Cristián Vargas Acevedo Feb 07 '23 at 02:55

score 65 · Answer 2 · edited Jun 04 '19 at 20:57

65

If you put results.show(false) , results will not be truncated

edited Jun 04 '19 at 20:57

Shubham Chaudhary

47,722
9
78
80

answered Apr 08 '16 at 19:02

Narendra Parmar

1,329
12
17

2

I imagine that [the comment on TomTom101's answer](http://stackoverflow.com/users/3435649/xv70) about `false` applies here, too. – Mogsdad Apr 28 '16 at 03:17
3

@Narendra Parmar the syntax should be `results.show(20, False)`. The one you have mentioned will give error. – Jai Prakash Jul 19 '17 at 01:08
2

@ Jai Prakash , i have given this answer for scala and you are talking about python, – Narendra Parmar Jul 19 '17 at 22:10
1

@NarendraParmar sorry you are correct. In `scala` both the options are valid. `results.show(false)` and `results.show(20, false)` – Jai Prakash Aug 09 '17 at 07:29
@JaiPrakash -- in ASA, "false" has to have a capital f: "False" is ok, but "false" gives an error. – Doug_Ivison Feb 28 '23 at 23:22

score 41 · Answer 3 · answered Feb 05 '17 at 01:21

41

Below code would help to view all rows without truncation in each column

df.show(df.count(), False)

answered Feb 05 '17 at 01:21

MoeChen

719
6
5

1

same questio i asked the prior answerer: does this cause `df` to be collected twice? – WestCoastProjects Apr 19 '18 at 03:51
@javadba yes, I think count() will go through df once, and show() will collect df twice. – MoeChen Feb 13 '20 at 20:00
As an alternative, you could give a very large number as the first parameter instead of `df.count()` in order to save on the requirement to persist. For example, if the row count of df is 1000, you could do `df.show(1000000, false)` and it will work. Tried the following and it worked: `scala> println(df.count) res2: Long = 987 scala> df.show(990)` – Omkar Neogi Nov 01 '21 at 09:53

score 23 · Answer 4 · answered Feb 15 '17 at 06:25

The other solutions are good. If these are your goals:

No truncation of columns,
No loss of rows,
Fast and
Efficient

These two lines are useful ...

    df.persist
    df.show(df.count, false) // in Scala or 'False' in Python

By persisting, the 2 executor actions, count and show, are faster & more efficient when using persist or cache to maintain the interim underlying dataframe structure within the executors. See more about persist and cache.

score 12 · Answer 5 · edited Oct 31 '17 at 15:12

12

results.show(20, False) or results.show(20, false) depending on whether you are running it on Java/Scala/Python

edited Oct 31 '17 at 15:12

Sai

711
6
24

answered Mar 08 '17 at 05:40

Deepak Babu P R

121
1
2

score 11 · Answer 6 · answered Jul 12 '21 at 21:39

11

In Pyspark we can use

df.show(truncate=False) this will display the full content of the columns without truncation.

df.show(5,truncate=False) this will display the full content of the first five rows.

answered Jul 12 '21 at 21:39

RaHuL VeNuGoPaL

447
5
7

score 8 · Answer 7 · answered Jun 10 '20 at 19:55

The following answer applies to a Spark Streaming application.

By setting the "truncate" option to false, you can tell the output sink to display the full column.

val query = out.writeStream
          .outputMode(OutputMode.Update())
          .format("console")
          .option("truncate", false)
          .trigger(Trigger.ProcessingTime("5 seconds"))
          .start()

score 4 · Answer 8 · answered Sep 10 '18 at 09:12

4

Within Databricks you can visualize the dataframe in a tabular format. With the command:

display(results)

It will look like

answered Sep 10 '18 at 09:12

Ignacio Alorre

7,307
8
57
94

how with display() show only, for example, first 5 rows? – unkind58 Oct 06 '22 at 14:45

score 4 · Answer 9 · answered Apr 01 '20 at 19:37

In c# Option("truncate", false) does not truncate data in the output.

StreamingQuery query = spark
                    .Sql("SELECT * FROM Messages")
                    .WriteStream()
                    .OutputMode("append")
                    .Format("console")
                    .Option("truncate", false)
                    .Start();

score 4 · Answer 10 · answered Jun 30 '21 at 14:36

4

Try df.show(20,False)

Notice that if you do not specify the number of rows you want to show, it will show 20 rows but will execute all your dataframe which will take more time !

answered Jun 30 '21 at 14:36

Djihane AKROUM

141
4

score 4 · Answer 11 · answered Apr 05 '22 at 12:13

In Spark Pythonic way, remember:

if you have to display data from a dataframe, use show(truncate=False) method.
else if you have to display data from a Stream dataframe view (Structured Streaming), use the writeStream.format("console").option("truncate", False).start() methods with option.

Hope it could helps someone.

score 3 · Answer 12 · answered Nov 25 '16 at 20:16

3

try this command :

df.show(df.count())

answered Nov 25 '16 at 20:16

epic_last_song

153
1
6

2

does this cause `df` to be collected twice? – WestCoastProjects Apr 19 '18 at 03:51
1

Try this: df.show(some no) will work but df.show(df.count()) will not work df.count gives output type long which is not accepted by df.show() as it accept integer type. – Thota Kranthi Kumar Aug 22 '17 at 11:38
Example use df.show(2000). It will retrieve 2000 rows – Thota Kranthi Kumar Aug 23 '17 at 04:43

score 3 · Answer 13 · edited Feb 13 '18 at 01:54

3

results.show(false) will show you the full column content.

Show method by default limit to 20, and adding a number before false will show more rows.

edited Feb 13 '18 at 01:54

OneCricketeer

179,855
19
132
245

answered Nov 08 '17 at 17:54

Chetan Tamballa

43
3

score 3 · Answer 14 · edited Nov 20 '18 at 16:55

3

results.show(20,false) did the trick for me in Scala.

edited Nov 20 '18 at 16:55

zero323

322,348
103
959
935

answered Apr 16 '18 at 18:32

SKA

67
3

score 3 · Answer 15 · answered Sep 18 '20 at 12:29

3

Tried this in pyspark

df.show(truncate=0)

answered Sep 18 '20 at 12:29

onemanarmy

93
10

score 1 · Answer 16 · answered Jan 13 '21 at 04:41

PYSPARK

In the below code, df is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as False.

df.show(df.count(),False)

SCALA

In the below code, df is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as false.

df.show(df.count().toInt,false)

score 0 · Answer 17 · answered Dec 10 '19 at 01:53

0

Try this in scala:

df.show(df.count.toInt, false)

The show method accepts an integer and a Boolean value but df.count returns Long...so type casting is required

answered Dec 10 '19 at 01:53

Pritesh Kumar

41
1
3

How to show full column content in a Spark Dataframe?

17 Answers17

Linked

Related