0

I am reading a file in my Python script which looks like this:

#im a useless comment
this is important

I wrote a script to read and split the "this is important" part and ignore the comment lines that start with #.

I only need the first and the last word (In my case "this" and "important").

Is there a way to tell Python that I don't need certain parts of a split?

In my example I have what I want and it works.

However if the string is longer and I have like 10 unused variables, I gues it is not like programmers would do it.

Here is my code:

#!/usr/bin/python3

import re

filehandle = open("file")
for line in file:

    if re.search("#",line):
        break;
    else:
        a,b,c = line.split(" ")
        print(a)
        print(b)

filehandle.close()
jwpfox
  • 5,124
  • 11
  • 45
  • 42
Nico
  • 323
  • 4
  • 14
  • Maybe `x=line.split(); print(x[0],x[-1])` ? – Alex Sep 12 '17 at 11:42
  • `a = line.split(' is ')` – JJAACCEeEKK Sep 12 '17 at 11:44
  • @JJAACCEeEKK : if I split using "is", this also get split. – Harsha Biyani Sep 12 '17 at 11:48
  • For the case you are working on `line.split()` is superior to `line.split(" ")`. Read more about str.split here https://docs.python.org/3/library/stdtypes.html?highlight=split#str.split "If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace." – jwpfox Sep 12 '17 at 11:58

6 Answers6

1

Another possibility would be:

a, *_, b = line.split()
print(a, b)
# <a> <b>

If I recall correctly, *_ is not backwards compatible, meaning you require Python 3.5/6 or above (would really have to look into the changelogs here).

Dave J
  • 475
  • 9
  • 18
0

You can save the result to a list, and get the first and last elements:

res = line.split(" ")
# res[0] and res[-1]

If you want to print each 3rd element, you can use:

res[::3]

Otherwise, if you don't have a specific pattern, you'll need to manually extract elements by their index.

See the split documentation for more details.

Maroun
  • 94,125
  • 30
  • 188
  • 241
0

If I've understood your question, you can try this:

s = "this is a very very very veeeery foo bar bazzed looong string"
splitted = s.split() # splitted is a list
splitted[0] # first element
splitted[-1] # last element

str.split() returns a list of the words in the string, using sep as the delimiter string. ... If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

In that way you can get the first and the last words of your string.

jwpfox
  • 5,124
  • 11
  • 45
  • 42
floatingpurr
  • 7,749
  • 9
  • 46
  • 106
0

On line 8, use the following instead of

a,b,c = line.split(" ")

use:

splitLines = line.split(" ")
a, b, c = splitLines[0], splitLines[1:-1], splitLines[-1]

Negative indexing in python, parses from the last. More info

dalonlobo
  • 484
  • 4
  • 18
0

I think python negative indexing can solve your problem

import re

filehandle = open("file")
for line in file:

    if re.search("#",line):
        break;
    else:
        split_word = line.split()
        print(split_word[0]) #First Word
        print(split_word[-1]) #Last Word

filehandle.close()

Read more about Python Negative Index

Anurag Misra
  • 1,516
  • 18
  • 24
0

For multiline text (with re.search() function):

import re

with open('yourfile.txt', 'r') as f:
    result = re.search(r'^(\w+).+?(\w+)$', f.read(), re.M)
    a,b = result.group(1), result.group(2)
    print(a,b)

The output:

this important
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105