Difference between tf.data.TextLineDataset and tf.data.experimental.make_csv_dataset

Question

Open a Google colab notebook and run below statements

#
import tensorflow as tf
import pathlib
import os
dataset = tf.data.TextLineDataset('/content/sample_data/california_housing_test.csv')
dataset ## output is <TextLineDatasetV2 shapes: (), types: tf.string>

Then run below

import tensorflow as tf
import pathlib
import os
dataset = tf.data.experimental.make_csv_dataset('/content/sample_data/california_housing_test.csv',batch_size=5)
dataset ## output is <PrefetchDataset shapes: OrderedDict([(longitude, (5,)), (latitude, (5,)), (housing_median_age, (5,)), (total_rooms, (5,)), (total_bedrooms, (5,)), (population, (5,)), (households, (5,)), (median_income, (5,)), (median_house_value, (5,))]), types: OrderedDict([(longitude, tf.float32), (latitude, tf.float32), (housing_median_age, tf.float32), (total_rooms, tf.float32), (total_bedrooms, tf.float32), (population, tf.float32), (households, tf.float32), (median_income, tf.float32), (median_house_value, tf.float32)])>

Clearly there is huge difference in the way tf.data.TextLineDataset and tf.data.experimental.make_csv_dataset handles text file. Why does tensorflow has these two one under experimental and other outside.

score 1 · Answer 1 · answered Jun 25 '21 at 13:49

The tf.data.TextLineDataset loads text from text files and creates a dataset where each line of the files becomes an element of the dataset.

Where as tf.data.experimental.make_csv_dataset, it reads CSV files into a dataset, where each element of the dataset is a (features, labels) tuple that corresponds to a batch of CSV rows. The file_pattern should be list of files or patterns of file paths containing CSV records.

tf.experimental indicates that the said class/method is in early development, incomplete, or less commonly, not up-to-standards. For more information you can refer this answer

Difference between tf.data.TextLineDataset and tf.data.experimental.make_csv_dataset

1 Answers1