0

I am trying to extract info from a page with a lot of spaces in it, so pretty much I want to search for ANY letter and get its position, not just one letter. How can this be accomplished?

Edit: I want to search this website http://www.aviationweather.gov/static/adds/metars/stations.txt for a user inputted city, say Anchorage. The program would then search for anchorage. Then I want to grab the next four letters, but the way the txt is formatted the number of spaces between the city and the four letter code is different for each town.

2 Answers2

1

You can use

listed = text.split()

To separate your text on all whitespaces. Then you will have a list consisting only of characters.

citypos = listed.index("Anchorage")
code = listed[citypos+1][:4]

To search for letters and numbers do:

positions = []
y = 0
for x in text:
    if x.isalnum(): positions.append(y)
    y += 1

That was what it looked like before you edited the question.

Dalen
  • 4,128
  • 1
  • 17
  • 35
  • Thanks for the response! I think I will use this as it seems efficient. –  Jul 08 '15 at 02:56
0

Looks like you're parsing a fixed-width structure, the struct module will be handy here. See this answer for examples.

What you'll want to do is define the format string for the records, and then call struct.unpack to convert that into a tuple of values. You can pair that with a namedtuple definition to make things accessible by name. Limited example using just the first few:

from collections import namedtuple
from struct import unpack

Weather = namedtuple('Weather', 'cd station icao iata')  # define the fieldnames
metar_fmt = '2s x 16s x 4s xx 3s xx'  # 's' represents string, 'x' is "ignore"
w = Weather._make(struct.unpack(metar_fmt, 'AK ANCHORAGE INTL   PANC  ANC  '))

# now you can use your namedtuple by fieldname:
print w.cd, w.station, w.icao, w.iata
if w.station.startswith('ANCHORAGE'):
    print w.icao
Community
  • 1
  • 1
tzaman
  • 46,925
  • 11
  • 90
  • 115