Concatenate values into Panda Series

Question

I have the following response from an API request:

    <movies>
    <movie>
    <rating>5</rating>
    <name>star wars</name>
    </movie>
    <movie>
    <rating>8</rating>
    <name>jurassic park</name>
    </movie>
    </movies>

is there a way to take this information and obtain the rating and name values and store inside a series in Pandas?

The end result would look like this:

      Movie Rating  
     5 - star Wars  
 8 - Jurassic park

You'll notice I've taken taking each of the values found in my response and added them to the one column. I was looking to add the 5 concatenate '-' and the star wars together for example.

Does this answer your question? [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) — G. Anderson, Mar 06 '20 at 22:13
A nice tutorial around the same idea here: https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c — Sinan Kurmus, Mar 06 '20 at 22:14
I'm probably not doing a good job at exampling so I apologize, I know I can take each value and map to its own column(Series), what I am looking to do is capture two values from the response as pointed in my question and place these two values under the same column. From the documentation I just read it seems to cover making new columns per value which I didn't want to do. — ApacheOne, Mar 06 '20 at 22:22

score 1 · Accepted Answer · answered Mar 06 '20 at 23:34

Is this what you are looking for? I have explained step-by-step in the code. There was one part I did not know how to do, but I researched and figured it out.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Data' : ['<movies>','<movie>','<rating>5</rating>',
                             '<name>star wars</name>', '</movie>', 
                             '<rating>8</rating>', '<name>jurassic park</name>', 
                             '</movie>', '</movies>']})
#Filter for the relevant rows of data based upon the logic of the pattern. I have also 
#done an optional reset of the index.
df = df.loc[df['Data'].str.contains('>.*<', regex=True)].reset_index(drop=True)
#For the rows we just filtered for, get rid of the irrelevant data with some regex 
#string manipulation
df['Data'] = df['Data'].str.findall('>.*<').str[0].replace(['>','<'], '', regex=True)
#Use join with shift and add_suffix CREDIT to @joelostblom:
#https://stackoverflow.com/questions/47450259/merge-row-with-next-row-in-dataframe- 
#pandas
df = df.add_suffix('1').join(df.shift(-1).add_suffix('2'))
#Filter for numeric rows only
df = df.loc[df['Data1'].str.isnumeric() == True]
#Combine Columns with desired format
df['Movie Rating'] = df['Data1'] + ' - ' + df['Data2']
#Filter for only relevant column and print dataframe
df = df[['Movie Rating']]
print(df)

This is it! thank you for the thorough explanation! – ApacheOne Mar 07 '20 at 00:07 — ApacheOne, Mar 07 '20 at 00:07

Concatenate values into Panda Series

1 Answers1