-1

I have a pandas dataframe that looks like this...

index my_column
0
1
2
3
4
5
6

What I need to do is conditionally assign values to 'my_column' depending on the index. The first three rows should have the values 'dog', 'cat', 'bird'. Then, the next three rows should also have 'dog', 'cat', 'bird'. That pattern should apply until the end of the dataset.

index my_column
0 dog
1 cat
2 bird
3 dog
4 cat
5 bird
6 dog

I've tried the following code to no avail.

for index, row in df.iterrows():
    counter=3
    my_column='dog'
    if counter>3
    break
    else 
    counter+=1
    my_column='cat'
    counter+=1
    if counter>3
    break
    else 
    counter+=1
    my_column='bird'
    if counter>3
    break  
ealfons1
  • 353
  • 1
  • 6
  • 24
  • As mentioned in other answers, your code has numerous logical and syntactical errors. The `break` keyword is used to exit a loop. You can't ever go back into a loop after breaking out of it, so if you want to repeat over the elements of an iterable, you need to stay in the loop but find a way to reset your counter. This is most easily achieved with the modulo operator: `%`. It's a really nifty operator with a ton of fascinating mathematical properties known collectively as modular arithmetic. – ddejohn Nov 23 '22 at 03:55

3 Answers3

0

Several problems:

  1. Your if syntax is incorrect, you are missing colons and proper indentation
  2. You are breaking out of your loop, terminating it early instead of using an if, elif, else structure
  3. You are trying to update your dataframe while iterating over it.

See this question about why you shouldn't update while you iterate.

Instead, you could do

values = ["dog", "cat", "bird"]

num_values = len(values)

for index in df.index():
    df.at[index, "my_column"] = values[index % num_values]
    
Dash
  • 1,191
  • 7
  • 19
  • I tried this solution but was getting the following error: 'Int64Index' object is not callable. I'm running your code inside a function. – ealfons1 Nov 23 '22 at 04:22
0

Advanced indexing

One solution would be to turn dog-cat-bird into a pd.Series and use advanced indexing:

dcb = pd.Series(["dog", "cat", "bird"])

df["my_column"] = dcb[df.index % len(dcb)].reset_index(drop=True)

This works by first creating an index array from df.index % len(dcb):

In [8]: df.index % len(dcb)
Out[8]: Int64Index([0, 1, 2, 0, 1, 2, 0], dtype='int64')

Then, by using advanced indexing, you can select the elements from dcb with that index array:

In [9]: dcb[df.index % len(dcb)]
Out[9]:
0     dog
1     cat
2    bird
0     dog
1     cat
2    bird
0     dog
dtype: object

Finally, notice that the index of the above array repeats. Reset it and drop the old index with .reset_index(drop=True), and finally assign to your dataframe.

Using a generator

Here's an alternate solution using an infinite dog-cat-bird generator:

In [2]: df
Out[2]:
  my_column
0
1
2
3
4
5
6

In [3]: def dog_cat_bird():
   ...:     while True:
   ...:         yield from ("dog", "cat", "bird")
   ...:

In [4]: dcb = dog_cat_bird()

In [5]: df["my_column"].apply(lambda _: next(dcb))
Out[5]:
0     dog
1     cat
2    bird
3     dog
4     cat
5    bird
6     dog
Name: my_column, dtype: object
ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • I tried both methods, but the results was that the assignment of new values skipped a row for some reason. – ealfons1 Nov 23 '22 at 04:23
  • Sounds like you copy-pasted something incorrectly. No offense, but the code above is proof that the solution does exactly what you asked for, which means that there's something about your specific dataframe that doesn't match the question. It's not entirely clear what you mean by "skipped a row". – ddejohn Nov 23 '22 at 04:27
0

Create a dictionary:

pet_dict = {0:'dog',
            1:'cat',
            2:'bird'}

You can get the index value using the .name and modulus (%) function by 3 to get your desired result:

df.apply (lambda x: pet_dict[x.name%3],axis=1)
0     dog
1     cat
2    bird
3     dog
4     cat
5    bird
6     dog
7     cat
8    bird
9     dog
gputrain
  • 186
  • 2