When reading a file without headers, existing answers correctly say that header=
parameter should be set to None
, but none explain why. It's because by default, header=0
, which means the first row of the file is inferred as the header. For example, the following code overwrites the first row with col_names
because the first row was read as the header and it was replaced by col_names
.
Note that it's assumed that the columns are separated by a space ' '
here.
col_names = ["Sequence", "Start", "End", "Coverage"]
df = pd.read_csv("path/to/file.txt", sep=' ') # <--- wrong
df.columns = col_names
To get the correct output, you'll need to set header=None
:
df = pd.read_csv("path/to/file.txt", sep=' ', header=None) # <--- OK
df.columns = col_names
or use names=
parameter to assign column names in one function call:
df = pd.read_csv("path/to/file.txt", sep=' ', names=col_names) # <--- OK
header=None
way is often preferred if the number of columns is not known (because it is vital that len(col_names)
is equal to the number of columns inferred from the file) or if the specific column names are not important. For example, calling add_prefix()
after read_csv
can add prefix to the default column names:
df = pd.read_csv("path/to/file.txt", sep=' ', header=None).add_prefix('col')