I have found variants of solutions to this question on stack but still cant seem to figure out why my use case does not work.
I have a large number of .LAS files that I want to concatenate. These are column based text files with a similar format.
Text Example:
# Produced by RMS 12
~VERSION
VERS . 2 :CWLS Log Ascii Standard - Version 2
WRAP . NO :One line per depth step
DLM . SPACE :Delimiting character (SPACE, TAB or COMMA)
~WELL
# Keyword.Unit Data Type Information
# ------------ --------- -------------------------------
WELL . C-34B :WELL
STRT .U.S. ft 53.8099 :First index value
STOP .U.S. ft 16129.8099 :Last index value
STEP .U.S. ft 0.5000 :Step of index
NULL . -999.25 :NoValue
FLD . :FIELD
COMP . :COMPANY
CTRY . :COUNTRY
DATE . :DATE DD/MM/YYYY
SRVC . :SERVICE COMPANY
LOC . :LOCATION
PROV . :PROVINCE
UWI . :UNIQUE WELL ID
X .U.S. ft 664504.000 :X-coordinate of Well Head
Y .U.S. ft 5959656.990 :Y-coordinate of Well Head
RKB .U.S. ft 83.3200 :RKB
WATER_DEPTH .U.S. ft 0.0000 :Seabed or ground level
~PARAMETER
# Keyword.Unit Value Description
# ------------ --------- -------------------------------
~CURVE
# Name.Unit Curve Description
# --------- -----------------
MD .U.S. ft :1 Index
ZoneLog_2018_JAO. :2
~ASCII MD ZoneLog_2018_JAO
53.8099 -999.25
54.3099 -999.25
54.8099 -999.25
55.3099 -999.25
55.8099 -999.25
I would like to add the file name ("01-01" in this case) to the first column and then concatentate the data below the ~ASCII MD row.
I.e. the data would look like this if I had more then one file:
Well Name MD _ZoneLog_JAO
01-01A 5000 9
01-01A 5001 9
01-01A 5002 10
01-01B 4999 9
01-01B 5000 9
01-01B 5001 10
01-01C 4856 8
01-01C 4857 9
01-01C 4859 10
I have attempted the script below to accomplish this (Import multiple csv files into pandas and concatenate into one DataFrame):
import pandas as pd
import os
import glob
merged_files = glob.glob("*.las") #creates a list of all las files
data = [] # pd.concat takes a list of dataframes as an agrument
for LAS in merged_files:
frame = pd.read_csv(LAS, skiprows=37)
frame['filename'] = os.path.splitext(LAS) [0]
data.append(frame)
merge = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
merge.to_csv("Merged_FFM15_ZoneLogs.txt", index=False)
print(merge.columns)
print(merge.shape)
This creates an output file called "Merged_FFM15_ZoneLogs.txt" that that kind of gets me there but the formatting looks to be a little odd.
Index([u'-0.0000 -999.25 ', u'0.0000 -999.25 ',
u'filename', u'~ASCII MD ZoneLog_2018_JAO'],
dtype='object')
(81456352, 4)
I am not sure why the index ends up like this. I think it might have to do with one spurious file but since I am looking at like a 1000 I am not sure how to track it down? Maybe this has to do with the irregular delimiter? I am bit at a loss.
It also only seems to work with a small number of the files instead of the 1000 plus that I need to combine. One solution I found to the file size is to run it on 64bit python instead of the 32 bit option. Is this the only way to do this?
In a perfect world I would be able to only keep certain columns in the final file and then also be able to filter certain rows out (i.e. rows that have -999.25 in _ZONE_JAO column). In the varients of this code I have tried I ran into "memory errors" and TypeError: cannot concatenate object of type '<class 'pandas.io.parsers.TextFileReader'>'; only Series and DataFrame objs are valid
errors.