3

I have a Pandas dataframe that looks as follows:

name1   country1    name2   country2
A       GER         B       USA
C       GER         E       GER
D       GER         Y       AUS
E       GER         A       USA

I want to get a new dataframe with two columns name and country that contains the unique pairs of (name1, country1) and (name2,country2).

The expected result should look like this:

name    country 
A       GER     
C       GER     
D       GER     
E       GER     
B       USA
A       USA
Y       AUS

I have found something similar for single columns here. However, I do not know how to transform this solution to my problem, that is, pairs of columns.

beta
  • 5,324
  • 15
  • 57
  • 99

1 Answers1

5

First filter columns by filter, transpose, flatten values and create new DataFrame by constructor:

a = df.filter(like='name').values.T.ravel()
b = df.filter(like='country').values.T.ravel()
df = pd.DataFrame({'name':a, 'country':b}, columns=['name','country'])
print (df)
  name country
0    A     GER
1    C     GER
2    D     GER
3    E     GER
4    B     USA
5    E     GER
6    Y     AUS
7    A     USA

Another solution with undocumented function lreshape:

df = pd.lreshape(df, {'name':['name1','name2'],
                      'country':['country1','country2']})
print (df)
  name country
0    A     GER
1    C     GER
2    D     GER
3    E     GER
4    B     USA
5    E     GER
6    Y     AUS
7    A     USA

And last for unique pairs use drop_duplicates:

df = df.drop_duplicates()
print (df)
  name country
0    A     GER
1    C     GER
2    D     GER
3    E     GER
4    B     USA
6    Y     AUS
7    A     USA
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252