Remove whitespace from DataFrame/ csv

Question

I want to include 2 DataFrames into 1 csv.

My code runs and it prints the 2 DataFrames, however I wanna remove the blank space in columns D,E.F row 2, so that the information in line 3 appears right under its headers.

The DATA is correct and all, just formatting is my question here.

(This will make sense if you run the code, very small file)

import pandas as pd
from bs4 import BeautifulSoup
import csv
import requests
line1=[]

url='https://clinicaltrials.gov/ct2/show/NCT03548207'
r=requests.get(url)
soup=BeautifulSoup(r.content,'html.parser')
content=soup.find_all('div',id='main-content')
for item in content:
    title=item.find('h1',class_='tr-h1 ct-sans-serif tr-solo_record').text
    sponsor=item.find('div', class_='tr-info-text').text
    summary=item.find('div',class_='ct-body3 tr-indent2').text
    dict={'Title':title,'Sponsor':sponsor,'Summary':summary}
    line1.append(dict)

df=pd.DataFrame(line1)


url2='https://clinicaltrials.gov/ct2/show/NCT03548207'
table1=pd.read_html(url2)[2]
dj=pd.DataFrame(table1)
kk=df.append(dj)
kk.to_csv('çode11.csv',index=False)

https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples Please make this a MRE. We shouldn't need to open links to see what is going on with your data — noah, Oct 26 '20 at 22:35
But we still need to interact with the url, etc. Just show us the df. All the beautiful soup stuff has nothing to do with your question. Your question is I have x df and I want y df — noah, Oct 26 '20 at 22:40

score 1 · Accepted Answer · answered Oct 26 '20 at 23:03

You can use pd.concat with axis=1 parameter:

import pandas as pd
from bs4 import BeautifulSoup
import csv
import requests

line1=[]

url='https://clinicaltrials.gov/ct2/show/NCT03548207'
r=requests.get(url)
soup=BeautifulSoup(r.content,'html.parser')
content=soup.find_all('div',id='main-content')
for item in content:
    title=item.find('h1',class_='tr-h1 ct-sans-serif tr-solo_record').text
    sponsor=item.find('div', class_='tr-info-text').text
    summary=item.find('div',class_='ct-body3 tr-indent2').text
    dict={'Title':title,'Sponsor':sponsor,'Summary':summary}
    line1.append(dict)

df=pd.DataFrame(line1)


url2='https://clinicaltrials.gov/ct2/show/NCT03548207'
table1=pd.read_html(url2)[2]
dj=pd.DataFrame(table1)

kk = pd.concat([df, dj], axis=1)  # <--- pd.concat
print(kk)
kk.to_csv('data.csv', index=False)

Prints:

                                               Title  ...             Phase
0  A Study of JNJ-68284528, a Chimeric Antigen Re...  ...  Phase 1  Phase 2

[1 rows x 6 columns]

And saves data.csv (screenshot from LibreOffice):

score 0 · Answer 2 · answered Oct 26 '20 at 22:47

0

Does shift solve your problem?

import pandas as pd

df = pd.DataFrame({"a":[0,1,2,3,4,5], 'b':[10, None, 12, 13, 14, 15], 'c':[20, None, 22, 23, 24, 25]})
df.loc[1:,'b':'c']=df.loc[1:,'b':'c'].shift(-1)
print(df)

answered Oct 26 '20 at 22:47

noah

2,616
13
27

it does not =-/ – Void S Oct 26 '20 at 22:58

Remove whitespace from DataFrame/ csv

2 Answers2