How to remove the first four lines and the last 12 lines in to a file in Python?

Question

        h = httplib.HTTPSConnection(host, port)
        h.set_debuglevel(0)

        headers = {

        "Content-Type": "multipart/form-data; boundary=%s" % (boundary,),

        "Connection": "Keep-Alive",

        }

        h.request('POST', uri, body, headers)
        res = h.getresponse()
        #print res.read()
        data = """MIME-Version: 1.0
        Content-Type: multipart/mixed; boundary=--Nuance_NMSP_vutc5w1XobDdefsYG3wq
        """ + res.read()

        msg = email.message_from_string(data)
        #print msg

        for index, part in enumerate(msg.walk(), start=1):
            content_type = part.get_content_type()
            #print content_type
            payload = part.get_payload()
            print res.getheaders()

            if content_type == "audio/x-wav" and len(payload):
                with open('output.pcm'.format(index), 'wb') as f_pcm:
                    print f_pcm.write(payload)

I am sending a request to the server and the server is sending a response back to the client as above in the form of .txt. The .txt contains an information header on the top and header at the bottom, which is of text format and the rest is binary.

How to write and parse the text and write it into a separate .txt file, and the binary into .pcm file?

Please could you edit the question to include the code you are using to get this. — Martin Evans, Feb 02 '16 at 17:29
Use python's [`email`](https://docs.python.org/2/library/email.html#module-email) package for parsing MIME. — jofel, Feb 02 '16 at 17:49
can you please tell me how to do that ? I am trying and not getting. — sam, Feb 02 '16 at 17:50
See also the following [related question](http://stackoverflow.com/questions/35154683/how-to-read-the-response-which-is-a-sound-file-in-python) from the OP. — Martin Evans, Feb 02 '16 at 17:57
http://stackoverflow.com/questions/2064184/remove-lines-from-textfile-with-python — DevLounge, Feb 02 '16 at 18:52
i tried that but can you please tell me how to do it for my example ? — sam, Feb 02 '16 at 19:23
Show us what you tried. Post a [minimal, complete, verifiable example](http://stackoverflow.com/help/mcve). Your code example does not work. `uri`, `h`, `boundary` and `body` are not defined. — Mark Tolonen, Feb 03 '16 at 07:09
@sam Can you please attach your file `Output.txt` eg. with Skydrive or Dropbox, so that we have a example to work with. Using your posted file content does not work for me. — wewa, Feb 03 '16 at 07:16

Martin Evans · Accepted Answer · 2016-02-12T11:13:52.460

1

The following kind of approach is recommended using Python's email library to try and decode the MIME:

import ssl
import os
import json
import email
import uuid
from io import BytesIO
import httplib


input_folder = os.path.dirname(os.path.abspath(__file__)) 
output_folder = os.path.join(input_folder, 'output')

def get_filename(ext, base, sub_folder):
    filename = '{}.{}'.format(base, ext)
    return os.path.join(output_folder, sub_folder, filename)

def compare_files(file1, file2):
    with open(file1, 'rb') as f_file1, open(file2, 'rb') as f_file2:
        if f_file1.read() == f_file2.read():
            print 'Same:\n  {}\n  {}'.format(file1, file2)
        else:
            print 'Different:\n  {}\n  {}'.format(file1, file2)

class Part(object):
    """Represent a part in a multipart messsage"""

    def __init__(self, name, contentType, data, paramName=None):
        super(Part, self).__init__()
        self.name = name
        self.paramName = paramName
        self.contentType = contentType
        self.data = data

    def encode(self):
        body = BytesIO()

        if self.paramName:
            body.write('Content-Disposition: form-data; name="%s"; paramName="%s"\r\n' % (self.name, self.paramName))
        else:
            body.write('Content-Disposition: form-data; name="%s"\r\n' % (self.name,))

        body.write("Content-Type: %s\r\n" % (self.contentType,))
        body.write("\r\n")
        body.write(self.data)
        return body.getvalue()

class Request(object):
    """A handy class for creating a request"""

    def __init__(self):    
        super(Request, self).__init__()
        self.parameters = []

    def add_json_parameter(self, name, paramName, data):
        self.parameters.append(Part(name=name, paramName=paramName, contentType="application/json; charset=utf-8", data=data))

    def add_audio_parameter(self, name, paramName, data):
        self.parameters.append(Part(name=name, paramName=paramName, contentType="audio/x-wav;codec=pcm;bit=16;rate=16000", data=data))

    def encode(self):
        boundary = uuid.uuid4().hex
        body = BytesIO()

        for parameter in self.parameters:
            body.write("--%s\r\n" % (boundary,))
            body.write(parameter.encode())
            body.write("\r\n")

        body.write("--%s--\r\n" % (boundary,))
        return body.getvalue(), boundary


def get_tts(required_text, LNG):
    required_text = required_text.strip()
    output_filename = "".join([x if x.isalnum() else "_" for x in required_text[:80]]) 

    host = "mtldev08.nuance.com"
    port = 443
    uri = "/NmspServlet/"

    if LNG == "ENG":
        parameters = {'lang' : 'eng_GBR', 'location' : '47.4925, 19.0513'}

    if LNG == "GED":
        parameters = {'lang' : 'deu-DEU', 'location' : '48.396231, 9.972909'}

    RequestData = """{
        "appKey": "9c9fa7201e90d3d96718bc3f36ce4cfe1781f2e82f4e5792996623b3b474fee2c77699eb5354f2136063e1ff19c378f0f6dd984471a38ca5c393801bffb062d6",
        "appId": "NMDPTRIAL_AutomotiveTesting_NCS61HTTP",
        "uId": "Alexander",
        "inCodec": "PCM_16_8K",
        "outCodec": "PCM_16_8K",
        "cmdName": "NVC_TTS_CMD",
        "appName": "Python",
        "appVersion": "1",
        "language": "%(lang)s",
        "carrier": "carrier",
        "deviceModel": "deviceModel",
        "cmdDict": {
            "tts_voice": "Serena",
            "tts_language": "%(lang)s",
            "locale": "canada",
            "application_name": "Testing Python Script",
            "organization_id": "NUANCE",
            "phone_OS": "4.0",
            "phone_network": "wifi",
            "audio_source": "SpeakerAndMicrophone",
            "location": "%(location)s",
            "application_session_id": "1234567890",
            "utterance_number": "5",
            "ui_langugage": "en",
            "phone_submodel": "nmPhone2,1",
            "application_state_id": "45"        
        }
    }""" % (parameters)

    TEXT_TO_READ = """{
        "tts_type": "text"
    }"""

    TEXT_TO_READ = json.loads(TEXT_TO_READ)
    TEXT_TO_READ["tts_input"] = required_text
    TEXT_TO_READ = json.dumps(TEXT_TO_READ)

    request = Request()
    request.add_json_parameter("RequestData", None, RequestData)
    request.add_json_parameter("TtsParameter", "TEXT_TO_READ", TEXT_TO_READ)

    #ssl._create_default_https_context = ssl._create_unverified_context
    body, boundary = request.encode()
    h = httplib.HTTPSConnection(host, port)
    #h.set_debuglevel(1)

    headers = {
        "Content-Type": "multipart/form-data; boundary=%s" % (boundary,),
        "Connection": "Keep-Alive",
    }

    h.request('POST', uri, body, headers)
    res = h.getresponse()

    data = """MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=--Nuance_NMSP_vutc5w1XobDdefsYG3wq
""" + res.read()

    msg = email.message_from_string(data)

    for part in msg.walk():
        content_type = part.get_content_type()
        payload = part.get_payload()

        if content_type == "audio/x-wav" and len(payload):
            ref_filename = get_filename('pcm', output_filename + '_ref', LNG)
            if not os.path.exists(ref_filename):
                with open(ref_filename, 'wb') as f_pcm:
                    f_pcm.write(payload)

            cur_filename = get_filename('pcm', output_filename, LNG)
            with open(cur_filename, 'wb') as f_pcm:
                f_pcm.write(payload)

            compare_files(ref_filename, cur_filename)

        elif content_type == "application/json":
            with open(get_filename('json', output_filename, LNG), 'w') as f_json:
                f_json.write(payload)


filename = r'input.txt'

with open(filename) as f_input:
    for line in f_input:
        LNG, text = line.strip().split('|')
        print "Getting {}: {}".format(LNG, text)
        get_tts(text, LNG)

This assumes your input.txt file has the following format:

ENG|I am tired
GED|Ich gehe nach hause

This will produce an output pcm and json file per line of text. It works with multiple files/languages.

edited Feb 12 '16 at 11:13

answered Feb 03 '16 at 07:07

Martin Evans

45,791
17
81
97

it is generating plain1.txt. it is reading complete .txt file what is the res.read but it din parse it. can you help me ? – sam Feb 03 '16 at 07:34
To help further (and be able to run the script) I would need to know the parameters needed to call `h.request`. The data you copy pasted in the question is not suitable. – Martin Evans Feb 03 '16 at 07:38
can you please see once ? – sam Feb 03 '16 at 08:36
I have added a simple way to extract the raw pcm to a file, I was able to hear the returned words. – Martin Evans Feb 03 '16 at 09:05
thanks but how to read the remaining text in the res and write it to a another file ? – sam Feb 03 '16 at 09:17
The returned data needed a suitable header to be added for the decoding to work correctly. It should work now. – Martin Evans Feb 03 '16 at 09:34
That is all of the json payload, the remaining "text" in the response is just [MIME headers](https://en.wikipedia.org/wiki/MIME#Multipart_messages) and formatting, `print res.getheaders()` would show you this. – Martin Evans Feb 03 '16 at 09:53
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/102563/discussion-between-sam-and-martin-evans). – sam Feb 04 '16 at 07:57
there will be several messages within input.txt. Finally I should have a respective .txt and .pcm for it. – sam Feb 04 '16 at 11:10
million thanksss. how to save it as a .json instead of .txt ? so that I will have my code in json format – sam Feb 04 '16 at 11:47
I did that but finally it does not contain data in json format. it will be in the form of .txt (horizontal line) – sam Feb 04 '16 at 11:55
json can be horizontal or vertical, both will parse fine. – Martin Evans Feb 04 '16 at 12:08
If I want to represent that in vertical . is it possible ? – sam Feb 04 '16 at 12:09
I suggest you pass the language as a parameter to the function. See my edit. – Martin Evans Feb 10 '16 at 07:15
but if it is ENG then go to that folder and if it is GED go to that path. how to do something like that ? request data varies from one language to another language. how can i do that ? – sam Feb 10 '16 at 07:42
Just add another parameter to get_filename to specify the sub-folder – Martin Evans Feb 10 '16 at 07:48
in my input.txt- one example is - ENG | I am tired. I will get the language as ENG (english). when I call get_tts. if LNG is ENG then take the specific request data and for other language then other request data. if I make the request data separate in the .json file, should I add TEXT_TO_READ also ? . how to read that as the input and send it as a request ? – sam Feb 12 '16 at 09:37
You need to make changes to the JSON text. This could be done using a dictionary to substitute the 3 pieces of information. I have updated the solution. – Martin Evans Feb 12 '16 at 10:20
thanks. but I have only one input.txt- which contains input as - ENG | I am tired and next- GED | Ich gehe nach hause and so on. What to do in such case ? – sam Feb 12 '16 at 10:51
The line `file_inputs = [("GED", ".\.\DATA\GED\GED.txt"), ("ENG", ".\.\DATA\ENG\ENG.txt")]` lets you specify which files you want to parse, and select which language each is in. – Martin Evans Feb 12 '16 at 10:53
thanks. I understood that but in my case there is only one input i.e input.txt - which contains - ENG | I am tired. next line GED | Ich gehe nach hause and so on. – sam Feb 12 '16 at 11:03
You mean you have changed your input file format to specify the language on each line? How do you delimit it? With a `|`? – Martin Evans Feb 12 '16 at 11:05
the above code is throwing an error as File ".\checking.py", line 172, in for LNG, filename in file_inputs: ValueError: need more than 1 value to unpack – sam Feb 12 '16 at 11:07
It means you did not pass either ENG or GED as a language. – Martin Evans Feb 12 '16 at 11:20
if I want to adopt this with a multiple server. how to do that ? give me some ideas, i will try myself. – sam Feb 12 '16 at 12:04
It is very unlikely this would work with a different server as they will have completely different interfaces. It is also quite advanced to work out the correct request/response decoding. I suggest you pick a new server, find out if anyone has attempted something similar, write some code and post a new question. – Martin Evans Feb 12 '16 at 12:24
I have same http request to the other servers also. – sam Feb 12 '16 at 12:36
You could pass `host` as another parameter to `get_tts()` – Martin Evans Feb 12 '16 at 12:42
if host , port and uri - all the three changes then how to adopt this code ? – sam Feb 12 '16 at 12:52
how did you specify the boundary ? - data = """MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=--Nuance_NMSP_vutc5w1XobDdefsYG3wq """ + res.read() ..... In the above code- If i print res. it is printing in binary . – sam Feb 16 '16 at 08:41
I printed the response and copied it. – Martin Evans Feb 16 '16 at 08:45
I messaged you in chat. please reply there. – sam Feb 16 '16 at 08:54
Sorry not near a computer this week. If you print data you should see the boundaries. – Martin Evans Feb 16 '16 at 09:08
http://stackoverflow.com/questions/35587230/how-to-process-the-response-from-the-server-using-python – sam Feb 23 '16 at 20:43

score -1 · Answer 2 · answered Feb 03 '16 at 06:47

-1

Following sample should work for you.

filecontent = []
with open("Output.txt", "rb") as inputfile:
    for linenr, line in enumerate(inputfile):
        filecontent.append(line)
    linecount = linenr + 1

with open("AsciiOut.txt", "wb") as outputfile, open("BinOut.pcm", "wb") as binoutputfile:
    for linenr, line in enumerate(filecontent):
        if linenr < 4:
            outputfile.write(line)
        elif linenr < linecount - 12:
            binoutputfile.write(line)
        else:
            outputfile.write(line)

answered Feb 03 '16 at 06:47

wewa

1,628
1
16
35

1

You are solving the symptoms instead of the actual problem; what he really wants is to extract the various parts of a multipart mime message. The "row count" approach is bound to break at any minimum perturbation (including just changing the attached file to something which happens to have a newline in itself). – Matteo Italia Feb 03 '16 at 06:52

How to remove the first four lines and the last 12 lines in to a file in Python?

2 Answers2

Linked