0

I want to include 2 DataFrames into 1 csv.

My code runs and it prints the 2 DataFrames, however I wanna remove the blank space in columns D,E.F row 2, so that the information in line 3 appears right under its headers.

The DATA is correct and all, just formatting is my question here.

(This will make sense if you run the code, very small file)

import pandas as pd
from bs4 import BeautifulSoup
import csv
import requests
line1=[]

url='https://clinicaltrials.gov/ct2/show/NCT03548207'
r=requests.get(url)
soup=BeautifulSoup(r.content,'html.parser')
content=soup.find_all('div',id='main-content')
for item in content:
    title=item.find('h1',class_='tr-h1 ct-sans-serif tr-solo_record').text
    sponsor=item.find('div', class_='tr-info-text').text
    summary=item.find('div',class_='ct-body3 tr-indent2').text
    dict={'Title':title,'Sponsor':sponsor,'Summary':summary}
    line1.append(dict)

df=pd.DataFrame(line1)


url2='https://clinicaltrials.gov/ct2/show/NCT03548207'
table1=pd.read_html(url2)[2]
dj=pd.DataFrame(table1)
kk=df.append(dj)
kk.to_csv('çode11.csv',index=False)
Void S
  • 752
  • 4
  • 14
  • https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples Please make this a MRE. We shouldn't need to open links to see what is going on with your data – noah Oct 26 '20 at 22:35
  • you dont need to open any link, the code saves into a csv. – Void S Oct 26 '20 at 22:38
  • But we still need to interact with the url, etc. Just show us the df. All the beautiful soup stuff has nothing to do with your question. Your question is I have x df and I want y df – noah Oct 26 '20 at 22:40

2 Answers2

1

You can use pd.concat with axis=1 parameter:

import pandas as pd
from bs4 import BeautifulSoup
import csv
import requests

line1=[]

url='https://clinicaltrials.gov/ct2/show/NCT03548207'
r=requests.get(url)
soup=BeautifulSoup(r.content,'html.parser')
content=soup.find_all('div',id='main-content')
for item in content:
    title=item.find('h1',class_='tr-h1 ct-sans-serif tr-solo_record').text
    sponsor=item.find('div', class_='tr-info-text').text
    summary=item.find('div',class_='ct-body3 tr-indent2').text
    dict={'Title':title,'Sponsor':sponsor,'Summary':summary}
    line1.append(dict)

df=pd.DataFrame(line1)


url2='https://clinicaltrials.gov/ct2/show/NCT03548207'
table1=pd.read_html(url2)[2]
dj=pd.DataFrame(table1)

kk = pd.concat([df, dj], axis=1)  # <--- pd.concat
print(kk)
kk.to_csv('data.csv', index=False)

Prints:

                                               Title  ...             Phase
0  A Study of JNJ-68284528, a Chimeric Antigen Re...  ...  Phase 1  Phase 2

[1 rows x 6 columns]

And saves data.csv (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

Does shift solve your problem?

import pandas as pd

df = pd.DataFrame({"a":[0,1,2,3,4,5], 'b':[10, None, 12, 13, 14, 15], 'c':[20, None, 22, 23, 24, 25]})
df.loc[1:,'b':'c']=df.loc[1:,'b':'c'].shift(-1)
print(df)
noah
  • 2,616
  • 13
  • 27