write rows in pandas dataframe and append it to existing dataframe

Question

I have the output of my script as year and the count of word from an article in that particular year :

I want to have each year added as a new column to my existing dataframe which contains only words.

Expected output:

Terms 2013  2014  2015 
abc   118   76    90
xyz   23    0     36

The input for my script was a csv file :

Terms
xyz
abc
efg

The script I wrote is :

df = pd.read_csv('a.csv', header = None)

for row in df.itertuples():
    term = (str(row[1]))
    u = "http: term=%s&mindate=%d/01/01&maxdate=%d/12/31"
    print(term)
    startYear = 2013
    endYear = 2018  

for year in range(startYear, endYear+1):
    url = u % (term.replace(" ", "+"), year, year)
    page = urllib.request.urlopen(url).read()
    doc = ET.XML(page)
    count = doc.find("Count").text
    print(year)
    print(count)

The df.head is :

                         0
0           1,2,3-triazole
1  16s rrna gene amplicons

Any help will be greatly appreciated, thanks in advance !!

`the output of my script`: Is this a `list`, output from `print`, or something else? We need to know what you are starting with to help you reach your destination. — jpp, Jun 21 '18 at 10:15
Sorry, for not being clear. It is the `list` output from `print` — K.S, Jun 21 '18 at 10:20
Nope still not clear. What does `list` output from `print` mean? Think of it this way, what can we copy-paste into our code to replicate the object containing all those items? — jpp, Jun 21 '18 at 10:25
Update your question, please. No code in comments. No images / links either. — jpp, Jun 21 '18 at 11:46
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173558/discussion-between-k-s-and-jpp). — K.S, Jun 21 '18 at 12:19

Petronella · Answer 1 · 2018-06-21T12:17:00.270

1

I would read the csv with numpy in an array, then reshape it also with numpy and then the resulting matrix/2D array to a DataFrame

edited Jun 21 '18 at 12:17

answered Jun 21 '18 at 12:06

Petronella

2,327
1
15
24

Not very familiar with numpy, can you help me with this – K.S Jun 21 '18 at 12:13
to read the file : https://stackoverflow.com/questions/3518778/how-to-read-csv-into-record-array-in-numpy and to reshape, sorry, not resize: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.reshape.html and from array to DataFrame: https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.DataFrame.html – Petronella Jun 21 '18 at 12:15
This is more of a comment. If you could add some sample code to solve OP's problem, that would be ideal – DJK Jun 21 '18 at 12:56
DJK, I do not know how the csv looks like, therfore I cannot actually give the code solution. In my comment though are links to many code examples, for each step. I did not find it fair to copy from other sources, just gave the source. – Petronella Jun 21 '18 at 14:58

hootnot · Accepted Answer · 2018-06-21T13:25:01.073

Something like this should do it:

#!/usr/bin/env python 

def mkdf(filename):
    def combine(term, l):
        d = {"term": term}
        d.update(dict(zip(l[::2], l[1::2])))
        return d

    term = None
    other = []
    with open(filename) as I:
        n = 0
        for line in I:
            line = line.strip()
            try:
                int(line)
            except Exception as e:
                # not an int
                if term:    # if we have one, create the record
                     yield combine(term, other)

                term = line
                other = []
                n = 0
            else:
                if n > 0:
                    other.append(line)
            n += 1

        # and the last one 
        yield combine(term, other)

if __name__ == "__main__":
    import pandas as pd
    import sys

    df = pd.DataFrame([r for r in mkdf(sys.argv[1])])
    print(df)

usage: python scriptname.py /tmp/IN ( or other file with your data)

Output:

  2013 2014  term
0  118   23  abcd
1    1   45   xyz

the file with the data you mentioned. I've changed the script to accept the file via commandline. — hootnot, Jun 21 '18 at 13:28

write rows in pandas dataframe and append it to existing dataframe

2 Answers2