I have a file that I have to parse that has a lot of links, and example of how it looks:
<hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-
pls/facebook?funn=wordlis&sys;sys;colorsdif_id=11908675">colors</p></hm>
<hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-
pls/facebook?funn=wordlis&sys;sys;colorsdif_id=45103481">yelloW</p></hm>
<td>I have a dream, and it is all good 2</hm>
<hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-
pls/facebook?funn=wordlis&sys;sys;colorsdif_id=40984930">orangE</p></hm>
<hm><w syst="whatrudoing" please="http://facebook.com.u/qwe-
pls/facebook?funn=wordlis&sys;sys;colorsdif_id=90648361">pinK</p></hm>
I only have to keep the words that are in the position of >colors< so I also want >yelloW<, >orangE< and >pinK<.
In this example, the common expression between them, will be all the link, except the number (the id, that it is a different number in all the links), and the word.
Just after finding all the words I want to save them in a dictionary, that use the first element as key and the others as elements, so the final result will be:
d = {"colors": ["yelloW", "orangE", "pinK"]}