-2

I have a small data set to clean. I have opened the text file in Pycharm. The data set is like this:

Code-6667+
Name of xyz company+ 
Address +
Number+ 
Contact person+
Code-6668+
Name of abc company, Address, number, contact person+
Code-6669+
name of company, Address+
number, contact person +

I need to separate the code lines and concatenate (or paste) the rest of the lines together till the next code line comes. This way I could separate my data into 2 fields, namely, the code of the company and secondly all the details all in one field. The eventual output being a table. The output should be something like this :

Code6667 - Company details 
Code6668 - Company details

Is there a way I could use a loop to do this? Tried this in R programming but now attempting it in Python.

Atlas7
  • 2,726
  • 4
  • 27
  • 36
H.Y
  • 1
  • 4
  • post the exact final result for your current inout – RomanPerekhrest Jul 19 '17 at 10:00
  • 1
    Congratulations, by changing your input example you practically created a whole new question and wasted a lot of valuable time of people that were willing to help. I don't imagine anyone will be so keen the second time around, not at least until you produce a [mcve] of your programming problem. Good luck. – zwer Jul 20 '17 at 17:35
  • Please could you change the input file back to the original version. If your input format is something else please ask a separate question instead (otherwise all the answers below would have gone to waste - people including myself have spent quite a bit of time figuring out a complete solution and posting here). Please do respect people's effort thank you. – Atlas7 Jul 20 '17 at 18:29
  • 1
    @Atlas ,@zwer.- i really appreciate and respect the effort that you've put into this. Im very new to stackoverflow and python itself. I will post another question later on. Apologies for the whole lot of inconvience caused. I thought my pseudo input would work that way. – H.Y Jul 20 '17 at 18:50
  • @H.Y no worries. If you have time however please do have a go trying out some of the solutions in this forum, with the original sample input. One you've got it working see if you could apply similar concepts / codes to your new set of samples. You never know it could be just slight "tweaks" of the code templates. Also when you have the time I'd appreciate that you close off this question by accepting one of the solutions below / casting votes (if it works of cos). You never know this might potentially help out others in the Stackoverflow community too if they happen to stumble onto this post! – Atlas7 Jul 21 '17 at 07:36

3 Answers3

0

I don't know what these + mean in your example.. if they are part of the file you'll want to deal with them as well but here is a way to extract the data (with regex) in a dictionary with the code as key and the info as a list.. afterwards you can format it however you want

This is assuming your entries, when on the same line are separated by ,, but it can be adapted for anything else. Also this is based on the fact that in your example every code is on a new line, and has no info after it.

import re

res = {}

with open('in.txt', 'r') as f:
    current = None
    for line in f.readlines():
        if re.match(r"Code-\d+", line):
            current = line.strip()
            res[current] = []
            continue
        if current: res[current] += line.strip().split(",")

print res

result:

{'Code-6667+': ['Name of xyz company+', 'Address +', 'Number+', 'Contact person+'], 'Code-6668+': ['Name of abc company', 'Address', ' number', ' contact person+'], 'Code-6669+': ['name of company ', ' Address+', 'number ', ' contact person +']}
KGS
  • 635
  • 4
  • 19
  • I get an error on the last line as it says "current" is not defined. Traceback (most recent call last): line 13, in res[current] += line.strip().split(",") NameError: name 'current' is not defined – H.Y Jul 19 '17 at 17:34
  • that means that the first line of your file is not a code like you described it. I have edited the code a bit so this does not happen, but in this way these lines will simply be skipped.. if you need something more concrete, update your example input (or just update the code from the answer to fit it) – KGS Jul 19 '17 at 19:05
0

Your question wasn't really clear, following a snippet to print out a line for each company, starting with "CodeXXXX - " and following with the other details.

with open(FILEPATH, 'r') as f:
    current_line = None
    for line in f:
        line = line.strip()
        if line.startswith('Code-'):
            # new company
            if current_line is not None:
                print(current_line)

            # create a line that starts with 'CodeXXXX - '
            current_line = line.replace('-', '').replace('+', '') + ' - '

        else:
            current_line += line
            current_line += ' '

Output of your example code:

Code6667 - Name of xyz company+ Address + Number+ Contact person+ 
Code6668 - Name of abc company,Address, number, contact person+ 
valentinarho
  • 121
  • 1
  • 5
0

(Note: I'm note quite sure whether you want to keep the + sign. The following codes assume you do. Otherwise it's easy to get rid of the + with a bit of string manipulations).

 Input file

Here is the input file...

dat1.txt:

Code-6667+
Name of xyz company+ 
Address +
Number+ 
Contact person+
Code-6668+
Name of abc company,Address, number, contact person+
Code-6669+
name of company , Address+
number , contact person +

Code

Here is the code... (comment / uncomment the print block for Python 2.x/3.x version)

mycode.py:

import sys
print sys.version

# open input text file
f = open("dat1.txt", "r")

# initialise our final output - a phone book
phone_book = {}

# parse text file data to phone book, in a specific format
code = ''
for line in f:
        if line[:5] == 'Code-':
            code = (line[:4] + line[5:]).strip()
            phone_book[code] = []
        elif code:
            phone_book[code].append(line.strip())    
        else:
            continue

# close text file
f.close()


# print result to console (for ease of debugging). Comment this block if you want:
for key, value in phone_book.items():

    #python 3.x
    # print("{0} - Company details: {1}".format(key, value))

    #python 2.x
    print key + " - Company details: " + "".join(value)

# write phone_book to dat2.txt
f2 = open("dat2.txt", "w")
for key, value in phone_book.items():
    f2.write("{0} - Company details: {1}\n".format(key, value))
f2.close()

 Output

Here is what you will see in console (via print()) or dat2.txt (via f2.write())...

# Code6667+ - Company details: ['Name of xyz company+', 'Address +', 'Number+', 'Contact person+']
# Code6668+ - Company details: ['Name of abc company,Address, number, contact person+']
# Code6669+ - Company details: ['name of company , Address+', 'number , contact person +']

 Screenshot

enter image description here

Atlas7
  • 2,726
  • 4
  • 27
  • 36
  • I tried the code . I get an error here: # add details to company phone_book[code].append(line.strip()) .... where it says "code" is not defined – H.Y Jul 20 '17 at 06:24
  • I've added a bit more "robustness" in the code - try again please? – Atlas7 Jul 20 '17 at 09:06
  • Im not really familiar with dictionaries till yet . Still reading upon it . The code works fine now , but to print or write it ? I just ran the code till end and its fine .Im trying to print phone_book – H.Y Jul 20 '17 at 09:20
  • Are you using Python 2.x or 3.x? The code is written in Python 3.x. In Python 3.x, the print syntax is `print("hello")`. In Python 2.x however, the syntax is `print "hello"`. – Atlas7 Jul 20 '17 at 09:25
  • I've added an extra block of code below the print step to write out result to `dat2.txt` too. Check it out! :) – Atlas7 Jul 20 '17 at 09:45
  • Im using pycharm with python interpretor sets as python 2.7.13 . – H.Y Jul 20 '17 at 09:45
  • In that case, if you want to print to console, change the `print(xxx) to `print xxx`. The rest of the code should be python 2.x/3.x compatible I think. Let me know if you still get errors? – Atlas7 Jul 20 '17 at 09:54
  • Writes an empty file to the desktop. :S – H.Y Jul 20 '17 at 09:57
  • hmmm it worked for me. Could you do me a favour and try doing this outside pycharm for the time being. i.e. within the same directory, have two files: `dat1.txt` (your input file), `mycode.py` (the actual code). Then navigate to that directory via command line interface. and then do a `python mycode.py`. You should see a `dat2.txt` freshly created there. – Atlas7 Jul 20 '17 at 10:03
  • (Regarding pycharm, I have a gut feeling it is something to do with how pycharm treats relative path. You may need to check what current working directory you are in within pycharm (current working directory should be where both code and `dat1.txt` sit). This stackoverflow might help with the pycharm path bit - https://stackoverflow.com/questions/34304044/pycharm-current-working-directory) – Atlas7 Jul 20 '17 at 10:07
  • @Atlas7- ive included the actual input .Plus opened another project with the new latest python edition as interpretor in pycharm. The input file is in the same working directory as the code – H.Y Jul 20 '17 at 17:29
  • I've updated the code (in particular the print statement for python 2.x). Also attached a screenshot on my end. Try again? (only the three files are relevant: `mycode.py`, `dat1.txt`, `dat2.txt`. – Atlas7 Jul 20 '17 at 18:08