0

I have a .csv file with headers.

I am trying to delete the header row and then open the same file for reading.

But the first line read is still the header line. How to I delete the header line and start reading from the first line of data?

Code snippet -

# Sort the cleaned file on r2
df = pd.read_csv(cleaned_file + ".csv", names=['r2','r5','r7','r12','r15','r70','r83'])
sorted_df = df.sort_values(by=["r2"], ascending=True)
sorted_df.to_csv(cleaned_file_sorted_on_ts + '.csv', index=False)

# Remove the header line from the cleaned_file_sorted_on_ts file
cmd = "tail -n +2 " + cleaned_file_sorted_on_ts + ".csv" + " > tmp.csv && mv tmp.csv " + cleaned_file_sorted_on_ts + ".csv"
print(cmd)
proc = Popen(cmd, shell=True, stdout=PIPE)

with open(cleaned_file_sorted_on_ts + ".csv","r") as infile:
    first_line = infile.readline().strip('\n')
    print("First line in cleaned file = {}".format(first_line))

Output I am getting is -

tail -n +2 /ghostcache/Run.multi.rollout/h2_lines_cleaned_sorted.csv > tmp.csv && mv tmp.csv /ghostcache/Run.multi.rollout/h2_lines_cleaned_sorted.csv
First line in cleaned file = r2,r5,r7,r12,r15,r70,r83
Traceback (most recent call last):
  File "process_r83.py", line 51, in <module>
    first_ts = int(float(first_line.split(',')[0]))
ValueError: could not convert string to float: 'r2'
Ira
  • 547
  • 4
  • 13
  • Please make a [mre] including minimal but complete code, some example data, and desired output. This might also help: [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Jan 26 '23 at 04:26
  • FWIW, you could clean up the definition of `cmd` by using an f-string, like `f"tail -n +2 {cleaned_file_sorted_on_ts}.csv > tmp.csv && ..."` – wjandrea Jan 26 '23 at 04:29
  • "Popen" runs the subprocess in parallel if you don't `wait`. Probably you are reading the file before the file was changed by the shell calls. – Michael Butscher Jan 26 '23 at 04:29
  • 2
    Why not just do `sorted_df.to_csv(..., header=False)`? – wjandrea Jan 26 '23 at 04:32

1 Answers1

0

You should reload the file into a pandas DF after removing the header line using the shell command, and then read the first line of the DF instead of the file. Can you try this out.

# Sort the cleaned file on r2
df = pd.read_csv(cleaned_file + ".csv", names=['r2','r5','r7','r12','r15','r70','r83'])
sorted_df = df.sort_values(by=["r2"], ascending=True)
sorted_df.to_csv(cleaned_file_sorted_on_ts + '.csv', index=False)

# Remove the header line from the cleaned_file_sorted_on_ts file
cmd = "tail -n +2 " + cleaned_file_sorted_on_ts + ".csv" + " > tmp.csv && mv tmp.csv " + cleaned_file_sorted_on_ts + ".csv"
print(cmd)
proc = Popen(cmd, shell=True, stdout=PIPE)

# Re-load the file into a DataFrame
df = pd.read_csv(cleaned_file_sorted_on_ts + ".csv")

# Get the first line of the DataFrame
first_line = df.iloc[0]
print("First line in cleaned file = {}".format(first_line))
Gihan
  • 3,144
  • 2
  • 9
  • 36