In pandas read_csv, is there a way to specify eg. col1, col15, wholeline?
I am trying to import about 700000 rows of data from a text file which has hats '^' as delimiters, no text qualifiers and carriage return as line delimiter.
From the text file I need column 1, column 15 and then the whole line in three columns of a table/dataframe.
I've searched how to do this in pandas, but don't know it well enough to get the logic. I can import fine for all 26 columns, but that doesn't help my problem.
my_df = pd.read_csv("tablefile.txt", sep="^", lineterminator="\r", low_memory=False)
Or I can use standard python to put the data into a table, but this takes about 4 hours for the 700000 rows. which is far too long for me.
count_1 = 0
for line in open('tablefile.txt'):
if count_1 > 70:
break
else:
col1id = re.findall('^(\d+)\^', line)
col15id = re.findall('^.*\^.*\^(\d+)\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*\^.*', line)
line = line.strip()
count_1 = count_1 + 1
cur.execute('''INSERT INTO mytable (mycol1id, mycol15id, wholeline) VALUES (?, ?, ?)''',
(col1id[0], col15id[0], line, ) )
conn.commit()
print('row count_1=',count_1)
In pandas read_csv, is there a way to specify eg. col1, col15, wholeline?
As in above, col1
and col15
are digits and wholeline
is a string
- I do not want to rebuild the string after import as I might lose some characters in the process.
Thanks
EDIT: Committing to the database for each line was burning time.