1

I have found variants of solutions to this question on stack but still cant seem to figure out why my use case does not work.

I have a large number of .LAS files that I want to concatenate. These are column based text files with a similar format.

Text Example:

# Produced by RMS 12
~VERSION
VERS        .          2                       :CWLS Log Ascii Standard - Version 2
WRAP        .          NO                      :One line per depth step
DLM         .          SPACE                   :Delimiting character (SPACE, TAB or COMMA)

~WELL
# Keyword.Unit         Data Type               Information
# ------------         ---------               -------------------------------
WELL        .          C-34B                   :WELL
STRT        .U.S. ft   53.8099                 :First index value
STOP        .U.S. ft   16129.8099              :Last index value
STEP        .U.S. ft   0.5000                  :Step of index
NULL        .          -999.25                 :NoValue
FLD         .                                  :FIELD
COMP        .                                  :COMPANY
CTRY        .                                  :COUNTRY
DATE        .                                  :DATE DD/MM/YYYY
SRVC        .                                  :SERVICE COMPANY
LOC         .                                  :LOCATION
PROV        .                                  :PROVINCE
UWI         .                                  :UNIQUE WELL ID
X           .U.S. ft   664504.000              :X-coordinate of Well Head
Y           .U.S. ft   5959656.990             :Y-coordinate of Well Head
RKB         .U.S. ft   83.3200                 :RKB
WATER_DEPTH .U.S. ft   0.0000                  :Seabed or ground level

~PARAMETER
# Keyword.Unit         Value                   Description
# ------------         ---------               -------------------------------

~CURVE
# Name.Unit                                    Curve Description
# ---------                                    -----------------
MD          .U.S. ft                           :1 Index
ZoneLog_2018_JAO.                                  :2

~ASCII MD             ZoneLog_2018_JAO
53.8099        -999.25        
54.3099        -999.25        
54.8099        -999.25        
55.3099        -999.25        
55.8099        -999.25        

I would like to add the file name ("01-01" in this case) to the first column and then concatentate the data below the ~ASCII MD row.

I.e. the data would look like this if I had more then one file:

Well Name   MD  _ZoneLog_JAO
01-01A  5000    9
01-01A  5001    9
01-01A  5002    10
01-01B  4999    9
01-01B  5000    9
01-01B  5001    10
01-01C  4856    8
01-01C  4857    9
01-01C  4859    10

I have attempted the script below to accomplish this (Import multiple csv files into pandas and concatenate into one DataFrame):

import pandas as pd
import os
import glob

merged_files = glob.glob("*.las") #creates a list of all las files

data = [] # pd.concat takes a list of dataframes as an agrument
for LAS in merged_files:
    frame = pd.read_csv(LAS, skiprows=37)
    frame['filename'] = os.path.splitext(LAS) [0]
    data.append(frame)

merge = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes

merge.to_csv("Merged_FFM15_ZoneLogs.txt", index=False)

print(merge.columns)
print(merge.shape)

This creates an output file called "Merged_FFM15_ZoneLogs.txt" that that kind of gets me there but the formatting looks to be a little odd.

Index([u'-0.0000        -999.25        ', u'0.0000         -999.25        ',
       u'filename', u'~ASCII MD             ZoneLog_2018_JAO'],
      dtype='object')
(81456352, 4)

I am not sure why the index ends up like this. I think it might have to do with one spurious file but since I am looking at like a 1000 I am not sure how to track it down? Maybe this has to do with the irregular delimiter? I am bit at a loss.

It also only seems to work with a small number of the files instead of the 1000 plus that I need to combine. One solution I found to the file size is to run it on 64bit python instead of the 32 bit option. Is this the only way to do this?

In a perfect world I would be able to only keep certain columns in the final file and then also be able to filter certain rows out (i.e. rows that have -999.25 in _ZONE_JAO column). In the varients of this code I have tried I ran into "memory errors" and TypeError: cannot concatenate object of type '<class 'pandas.io.parsers.TextFileReader'>'; only Series and DataFrame objs are valid errors.

geop
  • 187
  • 2
  • 4
  • 13
  • don't post screen shots of code and data. post the code and data as text – Paul H Sep 04 '20 at 19:40
  • take a look at [fastest file format for pandas](https://stackoverflow.com/q/22941147/6692898), once you loaded and concatenated your data you can save to file with a binary format for faster access – RichieV Sep 04 '20 at 19:49
  • Thanks for the advice @Paul H – geop Sep 04 '20 at 19:55
  • if the total data is too much for a single df you might want to drop the columns/rows you don't need before appending to `li`, and also you could save back to several files, but potentially many less files and already in a pandas friendly format – RichieV Sep 04 '20 at 19:55

0 Answers0