1

I'm using Python 2.7 and have a TSV formatted as follows (368 rows × 3 columns):

date    dayOfWeek    pageviews
2016    4            3920
...

I have a Jupyter notebook saved in the same location as the TSV. I'm running this code:

import pandas as pd
pd.read_table('query_explorer.tsv')

I get back a dataframe that's 736 rows × 3 columns and filled with NaNs. It's interesting too, because I should have only 368 rows (exactly half of what I do have).

Any idea what's going on here?

smci
  • 32,567
  • 20
  • 113
  • 146
anon_swe
  • 8,791
  • 24
  • 85
  • 145

2 Answers2

4

How about:

pd.read_table('query_explorer.tsv',delim_whitespace=True,header=0)
suvy
  • 693
  • 6
  • 18
1

In csv files comma is the separator. For tsv files, the tab character will separate each field. pandas according to separator can recognize and separate columns.

import pandas as pd
pd.read_csv('query_explorer.tsv',sep="\t")
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
sha_hla
  • 314
  • 1
  • 2
  • 13