I want to write a function that takes a URL as input, in our case: https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL&guccounter=2 and extracts a number within a Div and a Span tag using JSOUP, using pattern matching.
Input: Looking to get values 153,982,000, 125,481,000, 105,392,000, 105,718,000 on the Current Liabilities row for columns 9/30/2022, 9/30/2021, 9/30/2020, 9/30/2019.
On Inspect, these values are under the following tags.
Current Liabilities 1: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg D(tbc)" data-test="fin-col"><span>153,982,000</span></div>
Current Liabilities 2: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor) D(tbc)" data-test="fin-col"><span>125,481,000</span></div>
Current Liabilities 3: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg D(tbc)" data-test="fin-col"><span>105,392,000</span></div>
Current Liabilities 4: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg Bgc($lv1BgColor) fi-row:h_Bgc($hoverBgColor) D(tbc)" data-test="fin-col"><span>105,718,000</span></div>
Please fix my current code shown below, but also extract the row and column data instead of a string list:
private static List<String> balancefetch(String url) throws IOException {
String userAgent1 = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36 OPR/56.0.3051.43";
Document doc = Jsoup.connect(url).userAgent(userAgent1).get();
List<String> balanceValues = new ArrayList<>();
Element totalAssets = doc.select("[title=Total Assets]").first();
Elements totalAssetsData = totalAssets.parent().siblingElements();
for (Element e : totalAssetsData) {
Log.d("totalAssetsData", e.text());
balanceValues.add(e.text());
} //It works so far.
The lines below don't work because there isn't a title=Current Liabilities
, on the website.
Element currentLiabilities = doc.select("[title=Current Liabilities]").first();
Elements currentLiabilitiesData = totalLiabilities.parent().siblingElements();
for (Element e : currentLiabilitiesData) {
Log.d("currentLiabilitiesData", e.text());
balanceValues.add(e.text());
}
return balanceValues;
}
Pleas help me get the output in the following format:
dates = {9/30/2022, 9/30/2021, 9/30/2020, 9/30/2019}
CurrentLiabilities = {153,982,000, 125,481,000, 105,392,000, 105,718,000 }
EDIT:
The Date
values are under;
Please fix my code to extract the date values which are under:
Date1: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg D(ib) Fw(b)"><span>9/30/2022</span></div>
Date2: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg D(ib) Fw(b) Bgc($lv1BgColor)"><span>9/30/2021</span></div>
Date3: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg D(ib) Fw(b)"><span>9/30/2020</span></div>
Date4: <div class="Ta(c) Py(6px) Bxz(bb) BdB Bdc($seperatorColor) Miw(120px) Miw(100px)--pnclg D(ib) Fw(b) Bgc($lv1BgColor)"><span>9/30/2019</span></div>