I have a text file that I'm extracting text from using its punctuation and indentation patterns. The output should be a list of lists combining two lists; company_name and description
[[company,description],[company,description]]
To do that I'm running a while loop nested within a for loop to extract the description for each company. Here's my code
for line in file:
if not re.search(r" ", line, re.MULTILINE):
name = line.split(',', 1)[0]
companies.append(name)
print(companies)
companies = []
while re.search(r" ", line, re.MULTILINE):
desc.append(line)
print(desc)
desc = []
break
Sample from text file:
XYZ Group, a nearly nine-year-old, Copenhagen-based company that has built a dual-purpose platform, providing both accountancy software and a marketplace for small and medium businesses to find accountants, has landed $73 million in growth funding from a single investor, Lugard Road Capital. TechCrunch has more here.
Black Lake, a nearly five-year-old, China-based software platform for factory workers to log their daily tasks and managers to oversee the plant floor, recently raised $77 million in funding, including from Singapore’s sovereign wealth fund Temasek, which led the round, as well as China Renaissance and Lightspeed Venture Partners. The outfit has now raised more than $100 million altogether, including from from GGV...
['XYZ Group']
[' company that has built a dual-purpose platform, providing both']
[' accountancy software and a marketplace for small and medium']
[' businesses to find accountants, has landed 73 million in growth funding from a single investor,']
[' Lugard Road Capital TechCrunch has more']
[' here']
['Black Lake']
[' platform for factory workers to log their daily tasks and managers']
[' to oversee the plant floor, recently raised 77 million in funding,']
[' including from Singapore’s sovereign wealth fund Temasek,']
[' which led the round, as well as China']
[' Renaissance and Lightspeed Venture']
[' Partners The outfit has now raised more than 100']
[' million altogether, including from from GGV']
[' Capital, Bertelsmann Asia Investments,']
[' GSR Ventures, ZhenFund']
[' and others TechCrunch has more']
[' here']
The goal is to join the output of desc list under company name into 1 list
Update
I put desc = [] outside of the while loop and I'm getting this:
['XYZ Group']
[' company that has built a dual-purpose platform, providing both']
[' company that has built a dual-purpose platform, providing both', ' accountancy software and a marketplace for small and medium']
[' company that has built a dual-purpose platform, providing both', ' accountancy software and a marketplace for small and medium', ' businesses to find accountants, has landed 73 million in growth funding from a single investor,']
[' company that has built a dual-purpose platform, providing both', ' accountancy software and a marketplace for small and medium', ' businesses to find accountants, has landed 73 million in growth funding from a single investor,', ' Lugard Road Capital TechCrunch has more']
[' company that has built a dual-purpose platform, providing both', ' accountancy software and a marketplace for small and medium', ' businesses to find accountants, has landed 73 million in growth funding from a single investor,', ' Lugard Road Capital TechCrunch has more', ' here']
I only need the last iteration though