0

I'm trying to scrape some data from a website and have manage to collect the important information out yet when I am printing it into an excel file the data just flows into one column. Is there a solution from the code provided or will i need to create multiple outcomes and then print those out.

I'm very new to web scraping, I've tried to use .join which has just put all the data into one row which I want, however its all concatenated into a single column


totals = page_soup.findAll("p", {"class":"b-fight-details__table-text"})

for i in totals:
    stats = i.text.replace("\n"," ")        
    print(stats, end= " ")
    f.write(stats)

f.close()

 Stephen Thompson   Anthony Pettis         0            1            47 of 107            32 of 55            43%            58%            47 of 107 

Output is currently this, however it is all stuck in one column I want it to be as below, obviously i will have headers in the code for the output

Fighter A        Fighter B      KD  TKD  S     TS  
Stephen Thompson Anthony Pettis 0   1    47 of 107 32 of 55 43% 58% etc...
Til
  • 5,150
  • 13
  • 26
  • 34
  • Can we have the url? – QHarr Mar 28 '19 at 03:39
  • http://www.ufcstats.com/fight-details/56ae02578b1163ee – Matt Visintin Mar 28 '19 at 03:47
  • Don't you want fighters on separate rows? I see they have used paragraphs to separate as opposed to new rows. Will you be writing multiple times to the output file in separate runs or just scrape once and writing to file? – QHarr Mar 28 '19 at 03:52

3 Answers3

0

Just change the end of print

for i in totals:
    stats = i.text.strip()        
    print(stats, end = " ")
    #...#

it should work.

If you want the same in the output file you write replace the:

f.write(stats + " ")

with:

f.write(stats + " ")

For example:

with open("out.txt", "w") as f:
    for i in totals:
        stats = i.text.strip()        
        print(stats, end = " ");
        f.write(stats + " ")

If is the string itself that contains "\n" characters you can replace them:

with open("out.txt", "w") as f:
    for i in totals:
        stats = i.replace("\n", " ")        
        print(stats, end = " ");
        f.write(stats + " ")
cccnrc
  • 1,195
  • 11
  • 27
  • Seemed to do the trick, except that its still in one column when i open the csv file – Matt Visintin Mar 28 '19 at 03:36
  • Have you changed the f.write(stats + "\n") to f.write(stats + " ") ? If you write as I put in the answer it should write each i-element space separated. – cccnrc Mar 28 '19 at 03:39
  • for i in totals: stats = i.text.strip() print(stats, end = " ") f.write(stats + " ") f.close() Currently the code is that, which ends up the outcome being put into one column. Im able to use text to columns in excel to fix it – Matt Visintin Mar 28 '19 at 03:44
  • It could be that the string itself contains \n characters. Try this: stats = i.replace("\n", " ") and it should work. (or stats = i.text.replace("\n", " ") if you need it to convert it to a string) – cccnrc Mar 28 '19 at 03:47
  • still goes into one column, but seems to be able to cleanly transferred from text to columns in excel, i think i can work with this. – Matt Visintin Mar 28 '19 at 04:02
  • But you shoud find out the problem. Can you please edit your answer with an example of the string input you have? – cccnrc Mar 28 '19 at 04:03
0

You could try to replace the code line: print(stats) by print(stats, end = " ")

Ozcar Nguyen
  • 179
  • 1
  • 6
0

If only doing this once and you are happy with that layout (p tag separated content ends up in same cell)... you could use pandas

import pandas as pd   
tables = pd.read_html('http://www.ufcstats.com/fight-details/56ae02578b1163ee')
df = tables[0]
df.to_csv(r'C:\Users\User\Desktop\data.csv', sep=',', encoding='utf-8-sig',index = False )

If you want to use pandas to append for multiple fights see this answer:

https://stackoverflow.com/a/17135044/6241235

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • i will be changing my code to be able to read a csv file with "/56ae02578b1163ee" Will pandas be able to incorporate multiple fights? – Matt Visintin Mar 28 '19 at 04:18