0

I have to analyze earthquake data, and before I can begin analyzing the data, I have to change the format of the way the data is listed. I have to change the format from:

14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1
...

to:

["2.4, 14km WSW of Willow, Alaska", "0.9, 4km NNW of The Geysers, California",
"2.1, 13km ESE of Coalinga, California", ...]

The code that I have for the original format (omitting the url) is:

def fileToList(url):
    alist = []
    source = urllib2.urlopen(url)
    for line in source:
        items = line.strip()
        alist.append(items)
    return alist

I'm trying to create variables magnitude and earthquakeloc to rearrange the format of alist, but I just don't know where to start. I am very new to coding. Any suggestions would be wonderful, thank you.

Psidom
  • 209,562
  • 33
  • 339
  • 356
ari.montario
  • 95
  • 1
  • 2
  • 9

4 Answers4

0

If you're worried about formatting then I would use a collections.namedtuple as intermediate value:

from collections import namedtuple

Data = namedtuple('Data', ['position', 'magnitude'])

mystr = """14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1"""

list_of_data = []
for line in mystr.split('\n'):   # equivalent to your "for line in source"
    list_of_data.append(Data(*line.split('$')))

This will give you the following:

>>> list_of_data
[Data(position='14km WSW of Willow, Alaska', magnitude='2.4'),
 Data(position='4km NNW of The Geysers, California', magnitude='0.9'),
 Data(position='13km ESE of Coalinga, California', magnitude='2.1')]

Which can be easily manipulated:

>>> ['{x.magnitude}, {x.position}'.format(x=x) for x in list_of_data]
['2.4, 14km WSW of Willow, Alaska',
 '0.9, 4km NNW of The Geysers, California',
 '2.1, 13km ESE of Coalinga, California']

or sorted by magnitude:

>>> sorted(list_of_data, key=lambda x: x.magnitude)
[Data(position='4km NNW of The Geysers, California', magnitude='0.9'),
 Data(position='13km ESE of Coalinga, California', magnitude='2.1'),
 Data(position='14km WSW of Willow, Alaska', magnitude='2.4')

In the end it would probably make more sense to use a regex if your data set is huge. But parsing the data with str.split and saving it in namedtuples isn't very complicated to understand so I used that approach.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • You don't need the overhead of the `namedtuple` to create a `list` from a split. A list of `namedtuple`s is not the format the OP asked for either. – the_constant Feb 05 '17 at 18:53
  • But one can create the requested format (see `['{x.magnitude}, {x.position}'.format(x=x) for x in list_of_data]`). Also if you think of `namedtuples` as overhead you're doing a huge mistake. In this case there is a clear meaning to the parts of the string so why prefer an unnamed "list" or "tuple"? – MSeifert Feb 05 '17 at 18:56
  • Namedtuple does in fact add overhead: you're having to look up the namedtuple definition. And every time you access a field in a named tuple, you not only have the attribute lookup, but the index lookup as well. So for a list where you're substituting in namedtuple attributes, you have a very real overhead. See: http://stackoverflow.com/questions/2646157/what-is-the-fastest-to-access-struct-like-object-in-python – the_constant Feb 05 '17 at 19:01
  • if one wants it fast any `split` approach is probably inferior to doing one regex. If one wants a good structure for the data then `namedtuple` is a great choice and he already stated he wanted to analyze the data. So in the next steps having the `namedtuple` will help him more than any "one-trick-pony" :D – MSeifert Feb 05 '17 at 19:02
  • You're correct on the regex, but as the user is new to coding, I avoided regex. I'm not disagreeing that I'd fined a namedtuple useful for it in general, but I like to restrict my answers to solve the exact problem that the user requested help on. – the_constant Feb 05 '17 at 19:06
  • And adds icing on the cake, etc. It's a solution for the exact problem. But it's not the exact solution. That's the difference. – the_constant Feb 05 '17 at 19:12
0

hint:

>>> a = "14km WSW of Willow, Alaska$2.4"
>>> a = a.split("$")   split the string on `$`
>>> a
['14km WSW of Willow, Alaska', '2.4']
>>> a = a[::-1]        reverse the list    
>>> a
['2.4', '14km WSW of Willow, Alaska']
>>> ",".join(a)            give jon on `,`
'2.4,14km WSW of Willow, Alaska'

one liner:

>>> ",".join(a.split("$")[::-1])
'2.4,14km WSW of Willow, Alaska'

Pythonic way for your expected output:

>>> myString = """14km WSW of Willow, Alaska$2.4
... 4km NNW of The Geysers, California$0.9
... 13km ESE of Coalinga, California$2.1"""
>>> map(lambda x: ",".join(x.split("$")[::-1]), myString.strip().split("\n"))
['2.4,14km WSW of Willow, Alaska', '0.9,4km NNW of The Geysers, California', '2.1,13km ESE of Coalinga, California']
Hackaholic
  • 19,069
  • 5
  • 54
  • 72
  • This was helpful for getting the information in the right order, with magnitude of the earthquake first followed by the location, but for some reason the .join() didn't seem to work in turning the information into a list of strings for each earthquake. – ari.montario Feb 05 '17 at 21:27
  • My output looks like: ['2.4', '14km WSW of Willow, Alaska']\n ['0.9', '4km NNW of The Geysers, California'] \n['2.1', '13km ESE of Coalinga, California']\n ... from this code: def fileToList(url): alist = [] source = urllib2.urlopen(url) for line in source: items = line.strip().split("$") alist.append(items[::-1]) return alist – ari.montario Feb 05 '17 at 21:27
  • How can I get ["2.4, 14km WSW of Willow, Alaska", "0.9, 4km NNW of The Geysers, California", "2.1, 13km ESE of Coalinga, California", ...] as my output? – ari.montario Feb 05 '17 at 21:27
0

Let's say your source variable contains the following lines:

14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1

In most simple case it would be enough to use str.split and str.join functions:

def fileToList(url=''):
    source = urllib2.urlopen(url)

    return [', '.join(l.split('$')[::-1]) for l in source.split('\n') if l.strip()]

print(fileToList())

The output should be like below:

['2.4, 14km WSW of Willow, Alaska', '0.9, 4km NNW of The Geysers, California', '2.1, 13km ESE of Coalinga, California']
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
0

It seems that you're just trying to reorder the way each string is formatted, so if you have the initial data in a multiline string like so:

earthquake_data = """14km WSW of Willow, Alaska$2.4
4km NNW of The Geysers, California$0.9
13km ESE of Coalinga, California$2.1"""

then you can split it on the newlines to get a list of strings:

lines = data.split('\n')
>>> ['14km WSW of Willow, Alaska$2.4', '4km NNW of The Geysers, California$0.9', '13km ESE of Coalinga, California$2.1']

and for each item of the list of data, split it on the '$' symbol, which will leave you a list of lists like this:

split_lines = [l.split('$') for l in lines]
>>> [['14km WSW of Willow, Alaska', '2.4'], ['4km NNW of The Geysers, California', '0.9'], ['13km ESE of Coalinga, California', '2.1']]

You can then join each of these lists back into strings using the str.join() string method on each item in a list comprehension:

reformatted_data = [", ".join([l[1], l[0]]) for l in split_lines]
>>> ['2.4, 14km WSW of Willow, Alaska', '0.9, 4km NNW of The Geysers, California', '2.1, 13km ESE of Coalinga, California']

Here it all is wrapped up in a function:

def reformatStrings(data):
    lines = data.split("\n")
    split_lines = [l.split('$') for l in lines]
    reformatted_data = [", ".join([l[1], l[0]]) for l in split_lines]
    return reformatted_data


print(reformatStrings(earthquake_data))
Zach Ward
  • 117
  • 6