Pandas/BeautifulSoup - Getting Data from a URL with .txt

Question

I'm trying to get date from this website which has .txt in the URL. I'm a python newbie and just started last week. Here's the link:

http://regsho.finra.org/FNSQshvol20171121.txt

I tried using pandas, requests.get, and BeautifulSoup to get the data which I think I did it right. The next problem is...How can I index and play around with the data I just got. Here are my codes

page = requests.get('http://regsho.finra.org/FNSQshvol20171121.txt')
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())

OR

import pandas as pd
list = pd.read_table('http://regsho.finra.org/FNSQshvol20171121.txt')
list.head()
list.columns

How can I index the data I got from the website or select just certain columns?

list['Date', 'Symbol']
list[5:12]

and so on.

Please help! I feel like there should be a simpler way and I do not have to go to a hard route.

Any help is really appreciated!

score 0 · Accepted Answer · answered Nov 25 '17 at 02:00

I don't believe you read your dataframe correctly. If you're using read_table, use sep='|' to properly segment data into separate columns.

df = pd.read_table('http://regsho.finra.org/FNSQshvol20171121.txt', sep='|')

df.head()

       Date Symbol  ShortVolume  ShortExemptVolume  TotalVolume Market
0  20171121      A     625382.0             3586.0    1467570.0      Q
1  20171121     AA     873300.0             3417.0    2158580.0      Q
2  20171121   AAAP       4185.0              135.0     412030.0      Q
3  20171121   AABA     452857.0              300.0    4045918.0      Q
4  20171121    AAC      21235.0             1501.0      45747.0      Q

Now, df[['X', 'Y', ...]] gives you a dataframe slice of selected columns:

df[['Date', 'Symbol']].head()

       Date Symbol
0  20171121      A
1  20171121     AA
2  20171121   AAAP
3  20171121   AABA
4  20171121    AAC

To select row-column subslice, just use loc:

df.loc[5:12, ['Date', 'Symbol']]

        Date Symbol
5   20171121   AADR
6   20171121    AAL
7   20171121   AAMC
8   20171121   AAME
9   20171121    AAN
10  20171121   AAOI
11  20171121   AAON
12  20171121    AAP

If you want to assign it to df, then do so. These operations are not in-place.

df = df.loc[5:12, ['Date', 'Symbol']]

Beware, you lose your original dataframe. If you want to assign the slice to a different variable, you can do that too... but make sure you make a copy to prevent chained assignment.

df2 = df.loc[5:12, ['Date', 'Symbol']].copy()

Pandas/BeautifulSoup - Getting Data from a URL with .txt

1 Answers1