0

I am a new user of python. I have a list in .txt format (and .csv) like this

NEW YORK ....... from       
31 Chatty, Seager   Aarhaus     
Atlas, Jones    Abertham        
Polly, Manning Antwerpen        
Amazon, Brittle Belchental      
LONDON  ........ for        
31 Park  Dattemroed     
Eleanor, Mallett Civeta Naples      
3 Aurora Frigate    Ljubljana

and I want to have

NEW YORK .......  from 31 Chatty, Seager    Aarhaus     
NEW YORK .......  from Atlas, Jones Abertham        
NEW YORK .......  from Polly, Manning Antwerpen     
NEW YORK .......  from Amazon, Brittle  Belchental      
LONDON  ........ for 31 Park  Dattemroed        
LONDON  ........ for Eleanor, Mallett Civeta Naples     
LONDON  ........ for 3 Aurora Frigate   Ljubljana

I try to use regex, but I could not get the results.

I wonder whether there is a way to do this.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
i2_
  • 665
  • 2
  • 7
  • 14
  • 2
    "*I wonder whether there is a way to do this.*" -- Yes, there is. In fact, there are a great many ways to do it. Which way one does it depends upon many factors that you haven't shared. Perhaps you could show us what you've tried so far, and how it worked or didn't. Then we could build upon your work. – Robᵩ Jun 07 '16 at 21:34
  • 1
    Why are there 8 dots after `LONDON` but only 7 after `NEW YORK`? If you want to use Regex, you might want a [regular language](http://stackoverflow.com/questions/6718202/what-is-a-regular-language) – OneCricketeer Jun 07 '16 at 21:36
  • Thanks, I was trying to organise according to the uppercase words. I should have sent but i had almost nothing. – i2_ Jun 08 '16 at 12:18

3 Answers3

3

Here is one program that prints the output you want:

with open('x.in') as input_file:
    for line in input_file:
        line = line.rstrip()
        if '....' in line:
            city = line
            continue
        print (city, line)

Result:

NEW YORK ....... from 31 Chatty, Seager   Aarhaus
NEW YORK ....... from Atlas, Jones    Abertham
NEW YORK ....... from Polly, Manning Antwerpen
NEW YORK ....... from Amazon, Brittle Belchental
LONDON  ........ for 31 Park  Dattemroed
LONDON  ........ for Eleanor, Mallett Civeta Naples
LONDON  ........ for 3 Aurora Frigate    Ljubljana
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
1

If the city lines always have ..... you can use a groupby:

from itertools import groupby

with open(your_file) as f:
    grps = groupby(f, key=lambda line: "......." in line)
    for k,v in grps:
        if k:
            head = next(v).strip()
            print("\n".join(["{} {}".format(head, line.strip()) for line in next(grps)[1]]))

Which would give you:

NEW YORK ....... from 31 Chatty, Seager   Aarhaus
NEW YORK ....... from Atlas, Jones    Abertham
NEW YORK ....... from Polly, Manning Antwerpen
NEW YORK ....... from Amazon, Brittle Belchental
LONDON  ........ for 31 Park  Dattemroed
LONDON  ........ for Eleanor, Mallett Civeta Naples
LONDON  ........ for 3 Aurora Frigate    Ljubljana
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • thanks! I was trying to organise according to the uppercase words. but with your help, i did it – i2_ Jun 08 '16 at 12:23
  • I just realized that when i use ".write" rather than print, to save in external file, there is a problem. Programme do not understand whether next line has dots or not and gives me : `NEW YORK ....... from 31 Chatty, Seager Aarhaus NEW YORK ....... from Atlas, Jones Abertham NEW YORK ....... from Polly, Manning Antwerpen NEW YORK ....... from Amazon, Brittle Belchental LONDON ........ for 31 Park Dattemroed LONDON ........ for Eleanor, Mallett Civeta Naples LONDON ........ for 3 Aurora Frigate Ljubljana` – i2_ Dec 04 '16 at 21:50
0

thank you!

Actually, I was trying to organise according to the uppercase words. By changing Padraic Cunningham code, I did this

for line in Text:
newline = re.sub('^([A-Z][A-Z]+[A-Z])', '\\1≈', line)

≈ is just something that shows there is an uppercase word and then

grps = groupby(f_, key=lambda line: "≈" in line)
for k,v in grps:
    if k:
        head = next(v).strip()
        print('\n'.join(['{} {}'.format(head, line.strip()) for line in next(grps)[1]]))
i2_
  • 665
  • 2
  • 7
  • 14