0

This is a follow up question from here

Using Racialz answer i can loop everything using the regex but only the last line of data is stored in the database how do i store all of the data instead of only the last 1

for thisMatch in re.findall(r"<td>(.+?)</td>.+?<td>(.+?)</td>.+?<td>(.+?)</td>.+?<td>(.+?)</td>", match3, re.DOTALL):
        print(thisMatch[0], thisMatch[1], thisMatch[2])

sinfo = scrapyitem(name=thisMatch[0], hp=thisMatch[1], email=thisMatch[2])


 try:
     sinfo().save

EDIT

match2 and match are just regex to narrow down on the search data.( i know it might be redundant and some might ask me to use parser instead )

my_string = str(i)
    match = re.search("\<!-- populate table from mysql database -->(.*?)\     /tbody>" , my_string).group(1)    
match2 = re.findall('\<div class = "info">(.*?)</tr>' , match)
match3 = str(match2)

data:

 <div class = "info"> 
  <div class="name"><td>random</td></div>
  <div class="hp"><td>123456</td></div>
  <div class="email"><td>random@mail.com</td></div> 
 </div>

 <div class = "info"> 
  <div class="name"><td>random123</td></div>
  <div class="hp"><td>654321</td></div>
  <div class="email"><td>random123@mail.com</td></div> 
 </div>

The info saved into database will only be :

  random123
  654321
  random123@mail.com

match3 gives me:

<div class="name"><td>random</td></div>
<div class="hp"><td>123456</td></div>
<div class="email"><td>random@mail.com</td></div> 


<div class="name"><td>random123</td></div>
<div class="hp"><td>654321</td></div>
<div class="email"><td>random123@mail.com</td></div> 
Community
  • 1
  • 1
JustASimpleGuy
  • 171
  • 1
  • 1
  • 11
  • Is the statement `sinfo = scrapyitem(...` and then `sinfo.save()` inside the loop or outside? – AKS May 23 '16 at 04:53
  • i tried both only the last data was in the database which was weird for me – JustASimpleGuy May 23 '16 at 04:55
  • What is match3 in this case? Can you show more of the code where you get match3? Also that's a different regex than the one I wrote in my answer. What really matters is if match3 is one `
    ` or if match3 is the entire HTML string containing multiple `
    – Keatinge May 23 '16 at 05:33
  • The edit still doesn't make it clear what match3 is since we don't know what my_string or i is. Does `print(match3)` give you what you wrote in data: – Keatinge May 23 '16 at 05:43
  • i think, the data is overwritten in database – rock321987 May 23 '16 at 06:11
  • @rock321987 If its getting overwritten how do i actually solve this? – JustASimpleGuy May 23 '16 at 06:46
  • have you tried printing simply to see whether you are getting all the results? – rock321987 May 23 '16 at 06:48
  • which print are u refering to? if its this `print(thisMatch[0], thisMatch[1], thisMatch[2])` : all the results are out – JustASimpleGuy May 23 '16 at 07:01

1 Answers1

0

I think you need to go and read the python doc for re.findall():

https://docs.python.org/2/library/re.html

re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

So in your code you are iterating through all the matches in that list returned back from re.findall() however you are not storing the individual sets of data anywhere except for the last set in your list. So you would probably need to do a save per set or pass in a list to save in your backend DB.

Try this:

sinfo = []
for thisMatch in re.findall(r"<td>(.+?)</td>.+?<td>(.+?)</td>.+?<td>(.+?)</td>.+?<td>(.+?)</td>", match3, re.DOTALL):
    print(thisMatch[0], thisMatch[1], thisMatch[2])
    sinfo.append(scrapyitem(name=thisMatch[0], hp=thisMatch[1], email=thisMatch[2]))

try:
    for scrapyInfo in sinfo:
        scrapyInfo.save()
...

Thanks,

-Abe.

Abraham
  • 230
  • 3
  • 15
  • My mistake on having to delete, edit my answer and then undelete it. I misread your code in using re.finditer() instead of re.findall() however I believe I have addressed the issue now with the most recent edit. – Abraham May 23 '16 at 06:48
  • 1
    Sorry i might sound dumb asking this but how do i actually do a save per set in this case? i understand whats the general idea on whats the error is – JustASimpleGuy May 23 '16 at 07:00
  • No problem and there is no such thing as a dumb question. Follow up to your question can you please outline what your scrapyitem class looks like? – Abraham May 23 '16 at 07:07
  • 1
    `class scrapyitem (DjangoItem): django_model = html ` – JustASimpleGuy May 23 '16 at 07:10
  • I will edit my answer to provide you with a potential solution. – Abraham May 23 '16 at 07:12
  • I am sorry i forgot to add that this class scrapyitem is imported from items.py would it affect your code? from scrapybot.items import scrapyitem. I gt this error : UnboundLocalError: local variable 'scrapyitem' referenced before assignment – JustASimpleGuy May 23 '16 at 08:55
  • Its okay already i put in a global scrapyinfo and its solved. Thanks so much for the help along the way – JustASimpleGuy May 23 '16 at 09:16
  • That's very nice, so did you have to resort to using the list of scrap infos? – Abraham May 23 '16 at 13:35