Reading next line from html

Question

I am using the Beautiful Soup package to do some webscraping and I want to be able to put lines into a dictionary, named table, where each key will have multiple values.

This dictionary is representative of a table and will eventually be made into a table

I have scraped the html to provide me with the key values, but the issue is reading the next line from the html and matching with correct key.

These names are the dictionary key names:

RowName
UpdateTime
State
OrdersC
TicketsR
OrdersNC
TicketsNR
ReadingTime
ClearingTime
ClearingInProgress
Volumes
StartTime
StopTime

This is how the data looks (when printed to console):

(NOTE: There will be more than two of these result sets)

NYBOT 
00:10:39 
Not Connected 
0 
7043 
0 
7043 
07:58:30 
--:--:-- 
0 
0 
02:30:00  
20:00:00 
MONTREAL 
N/A 
N/A 
0 
145 
0 
145 
07:59:01 
--:--:-- 
0 
0 
01:00:00  
20:00:00

So the dictionary will look like:

{RowName: [NYBOT, MONTREAL], UpdateTime: [00:10:39, N/A], ... ,  StopTime: [20:00:00,20:00:00]}

I have tried this, but to no avail as the error I get is that the next() function cannot iterate over strings:

for line in site.find_all('td'):
  line  = line.strip()
  table.update(RowName = line.text.replace('\xa0', ''))
  next(line)
  .
  .
  .
  next(line)
  table.update(StopTime = line.text.replace('\xa0', ''))

.find_all('td') will already give you each line as an element in a list. If you already know the number and order of the elements you will have you can just use two lists and create a dictionary using: "dict(zip(keys, values))" — SorenLantz, Nov 16 '18 at 15:11
Possible duplicate of [Python beautifulsoup grab table](https://stackoverflow.com/questions/22812536/python-beautifulsoup-grab-table) — stovfl, Nov 16 '18 at 15:16
@SorenLantz, I agree. I have tried the zip() function but it did not zip as it should. There will be more than one set of results. — swagless_monk, Nov 16 '18 at 15:27
@swagless_monk If that is the case then line.text.replace may not be returning the string you want — SorenLantz, Nov 16 '18 at 15:29
@SorenLantz, Not sure that is the case because the replace function is just to remove extraneous characters from the existing strings — swagless_monk, Nov 16 '18 at 15:45

score 0 · Answer 1 · answered Nov 16 '18 at 15:18

0

Put the data in a dictionary because you already know the length and order of elements.

characteristics = ['RowName','UpdateTime','State','OrdersC','TicketsR'....'StopTime']
data = []

for line in site.find_all('td'):
    line  = line.strip()
    line.text.replace('\xa0', '')
    data.extend(line)

info = dict(zip(characteristics, data))

answered Nov 16 '18 at 15:18

SorenLantz

177
2
14

This worked fine for the case with just one, but there is more than one set to consider. Nevertheless, this is an acceptable solution and perhaps I can apply this formula to the case where there are more result sets. – swagless_monk Nov 16 '18 at 15:47

score 0 · Accepted Answer · answered Nov 19 '18 at 14:18

0

<<dict_name>> = {z[0]:list(z[1:]) for z in zip(<<keys>>,<<value_1>>, <<value_2>>,..., <<value_N>>)}

This is what did the trick for me to create a dictionary with multiple key values

answered Nov 19 '18 at 14:18

swagless_monk

441
6
21

Reading next line from html

2 Answers2