This program outputs a table with two columns. Can I specify which columns you want to display?
def main():
bulletins = os.listdir(INPUT_DATA_DIR)
df = pd.DataFrame(bulletins)
df.columns = ['filename']
df['html'] = df.filename.apply(read_file)
print(df.head())
def get_document_id(page):
soup = BeautifulSoup(page, 'lxml')
div = soup.find('div')
print(div)
def read_file(filename):
with open(INPUT_DATA_DIR / filename,'r') as f:
data = f.read()
return data
Now I have two columns, in the future there will be more. Can I output only certain columns? For example, can I output the first two columns?
At the moment I have this table:
filename html
0 support.hpe.com-hpesc-public-api-document-c008... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
1 support.hpe.com-hpesc-public-api-document-c043... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
2 support.hpe.com-hpesc-public-api-document-c008... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
3 support.hpe.com-hpesc-public-api-document-c007... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
4 support.hpe.com-hpesc-public-api-document-c018... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
.. ... ...
442 support.hpe.com-hpesc-public-api-document-c009... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
443 support.hpe.com-hpesc-public-api-document-c021... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
444 support.hpe.com-hpesc-public-api-document-c009... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
445 support.hpe.com-hpesc-public-api-document-c008... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
446 support.hpe.com-hpesc-public-api-document-c008... <!DOCTYPE html><html xmlns:msxsl="urn:schemas-...
[447 rows x 2 columns]