0

so I have a for loop that loops through countries and each country has either a yes or a no, I want the corresponding animal to be added to a list each time there is a yes triggered. For example, I have a list that goes

Countries = ['Germany','France'..etc etc]

my DF is something like this

animal  Germany  France  
Rabbit    yes       yes
Bear      no        yes
...

I want a list of animals such that there is a yes for the countries selected in the countries list. So in the instance above, I would want

animal_list = [Rabbit, Rabbit, Bear]

and my main code goes something like this, I have my attempt below as well but it doesn't work. Is there a clean way of doing it?

 Countries = ['Germany','France'..etc etc]
 animals_list = []
 for country in Countries:
   animal_list = animal_list.append(df[df[country] == 'yes'],'animal'])

The for loop is required so I am unable to do it off the bat using pandas.

Jason_Leto
  • 55
  • 5

3 Answers3

2

Considering you have a Dataframe like this

data = {'animal':['Rabbit', 'Bear'],
    'Germany':['yes', 'no'],
    'France': ['yes', 'no']
   }
df = pd.DataFrame(data)

If the wanted countries are given in a list:

# In Python, Try to use lowercase, underscore seperated names for your variables (PEP8)

countries = ['Germany', 'France']

Then you can select those columns:

# Select the countries that you want
df_subset = df[df.columns.intersection(countries)]

And calculate number of yes for each animal:

animals_index_to_num_yes = df_subset.eq('yes').sum(axis=1)

In this way the list can be created very easily:

animals_list = []

for index, animal in df['animal'].iteritems():
    occurences = animals_index_to_num_yes.get(index)
    animals_list.extend(
        [animal] * occurrences
    )

Notes:

  1. Try to avoid for loops in Pandas as much as possible, in general, built-in methods will have a better performance because of the use of concurrency. See this excellent answer for more.
  2. In your case, as the order of the animals in the output list matters, I'm not sure if the loop can be avoided, therefore I used a for loop.
Danial
  • 362
  • 4
  • 18
0
animals_list=[]
country_list=['germany','france']

for i in range(len(df)):
    for country in country_list:
        if df[country].iloc[i]=='yes':
            animals_list.append(df.animal.iloc[i])

print(animal_list)

Output : ['rabbit', 'rabbit', 'bear']

Bharat Adhikari
  • 334
  • 1
  • 6
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 19 '22 at 16:45
0

I found a very simple solution which seems to do the trick for me.

Countries = ['Germany','France'..etc etc]
animals_list = []
for country in Countries:
   animals = list(df[df[country] == 'yes'],'animal'])
   animals_list = animals_list + animals
Jason_Leto
  • 55
  • 5