I have a many large tab-separated files saved as .txt
, which each have seven columns with the following headers:
#column_titles = ["col1", "col2", "col3", "col4", "col5", "col6", "text"]
I would like to simply extract the final column named text
and save it into a new file with each row being a row from the original file, while are all strings.
EDIT: This is not a duplicate of a similar problem, as splitlines()
was not necessary in my case. Only the order of things needed to be improved
Based on -several - other - posts, here is my current attempt:
import csv
# File names: to read in from and read out to
input_file = "tester_2014-10-30_til_2014-08-01.txt"
output_file = input_file + "-SA_input.txt"
## ==================== ##
## Using module 'csv' ##
## ==================== ##
with open(input_file) as to_read:
reader = csv.reader(to_read, delimiter = "\t")
desired_column = [6] # text column
for row in reader:
myColumn = list(row[i] for i in desired_column)
with open(output_file, "wb") as tmp_file:
writer = csv.writer(tmp_file)
for row in myColumn:
writer.writerow(row)
What I am getting, is simply the text field from the 2624th row form my input file, with each of the letters in that string being separated out:
H,o,w, ,t,h,e, ,t.e.a.m, ,d,i,d, ,T,h,u,r,s,d,a,y, ,-, ,s,e,e , ,h,e,r,e
I know very little in the world of programming is random, but this is definitely strange!
This post is pretty similar to my needs, but misses the writing and saving parts, which I am also not sure about.
I have looked into using the pandas
toolbox (as per one of those links above), but I am unable to due my Python installation, so please only solutions using csv
or other built in modules!