0

I've only started python recently but am stuck on a problem.

# function that tells how to read the urls and how to process the data the
# way I need it.


def htmlreader(i):
    # makes variable websites because it is used in a loop.
    pricedata = urllib2.urlopen(
        "http://website.com/" + (",".join(priceids.split(",")[i:i + 200]))).read()

    # here my information processing begins but that is fine.
    pricewebstring = pricedata.split("},{")
    # results in [[1234,2345,3456],[3456,4567,5678]] for example.
    array1 = [re.findall(r"\d+", a) for a in pricewebstring]

    # writes obtained array to my text file
    itemtxt2.write(str(array1) + '\n')

i = 0
while i <= totalitemnumber:
    htmlreader(i)
    i = i + 200

See the comments in the script as well.

This is in a loop and will each time give me an array (defined by array1).

Because I print this to a txt file it results in a txt file with separate arrays. I need one big array so it needs to merge the results of htmlreader(i).

So my output is something like:

[[1234,2345,3456],[3456,4567,5678]]
[[6789,4567,2345],[3565,1234,2345]]

But I want:

[[1234,2345,3456],[3456,4567,5678],[6789,4567,2345],[3565,1234,2345]]

Any ideas how I can approach this?

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
SecondLemon
  • 953
  • 2
  • 9
  • 18
  • You can use `+` to combine arrays. You could also use `[html_reader(i) for i in range(0, totalitemnumber, 200)]` – user3467349 Feb 25 '15 at 13:23
  • yes but each time the array is defined by array1. So I cannot add them right? As in the first time array1= [[1234,2345,3456],[3456,4567,5678]] This is stored because it writes array1 to a txt file. Then on the second run array1 is redefined because of the new web adres. – SecondLemon Feb 25 '15 at 13:28
  • You need to pass `array1` out with a return call, then either append or add them to your main array, or use list comprehension as I mentioned above. You can't manipulate the objects once you've written them to a text file if your goal is to have a single array - also you may want to pickle it (I'm guessing this is for further python processing use since you want a single array). – user3467349 Feb 25 '15 at 13:36

1 Answers1

2

Since you want to gather all the elements in a single list, you can simply gather them in another list, by flattening it like this

def htmlreader(i, result):
    ...
    result.extend([re.findall(r"\d+", a) for a in pricewebstring])

i, result = 0, []
while i <= totalitemnumber:
    htmlreader(i, result)
    i = i + 200

itemtxt2.write(str(result) + '\n')

In this case, the result created by re.findall (a list) is added to the result list. Finally, you are writing the entire list as a whole to the file.

If the above shown method is confusing, then change it like this

def htmlreader(i):
    ...
    return [re.findall(r"\d+", a) for a in pricewebstring]

i, result = 0, []
while i <= totalitemnumber:
    result.extend(htmlreader(i))
    i = i + 200
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
  • Yeah he wasn't using `findall` correctly. Also it may be that his data is to big to write as an entire list, but in that case it's meaningless for it to be one python array, since you can't load it into memory anyways. – user3467349 Feb 25 '15 at 13:33
  • @user3467349 Yeah, I think it would be better for him to collect the data in a list like I have shown in the answer. – thefourtheye Feb 25 '15 at 13:34
  • Thanks, works perfect. Will take some time for me to understand it but will manage :) Accepted and +1 – SecondLemon Feb 25 '15 at 13:36
  • @SecondLemon Thanks :) Please check [this answer](http://stackoverflow.com/a/252711/1903116) for differences between `append` and `extend` methods in list. – thefourtheye Feb 25 '15 at 13:39