Add a column to a pandas dataframe with A, B, C repeating

Question

How can I add a column to a pandas dataframe with values 'A', 'B', 'C', 'A', 'B' etc? i.e. ABC repeating down the rows. Also I need to vary the letter that is assigned to the first row (i.e. it could start ABCAB..., BCABC... or CABCA...).

I can get as far as:

df.index % 3

which gets me the index as 0,1,2 etc, but I cannot see how to get that into a column with A, B, C.

Many thanks,

Julian

So, each row should contain a string of these three letters randomized... right? — Anwarvic, Jun 03 '20 at 18:48
You want to `pivot` if I understand you correctly. See the question and answer [here](https://stackoverflow.com/questions/28337117/how-to-pivot-a-dataframe-in-pandas) — Erfan, Jun 03 '20 at 18:49
Your question is unclear. Please provide a [mcve] including sample input and expected output, according to [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — G. Anderson, Jun 03 '20 at 18:53

score 2 · Answer 1 · answered Jun 03 '20 at 18:59

2

If I've understood your question correctly, you can create a list of the letters as follows, and then add that to your dataframe:

from itertools import cycle
from random import randint

letter_generator = cycle('ABC')
offset = randint(0, 2)
dataframe_length = 10 # or just use len(your_dataframe) to avoid hardcoding it
column = [next(letter_generator) for _ in range(dataframe_length + offset)]
column = column[offset:]

answered Jun 03 '20 at 18:59

ap1997

183
6

This would be much more efficient if you were to discard a number of elements according to `offset` and generate `column` without any slicing, and avoiding repeated `next()` calls by using the `zip(range())` idiom: `[x for _, x in zip(range(dataframe_length), letter_generator)]` – norok2 Jun 03 '20 at 19:15
Assuming an exisiting DataFrame: No need to use `range` - `[next(letter_generator) for _ in df.iloc[0]]` – wwii Jun 03 '20 at 19:15
@wwii still calling `next()` a lot, while `zip()` would be much faster. – norok2 Jun 03 '20 at 19:18
@norok2 1.5 seconds vs 1.1 seconds for ten million items. – wwii Jun 03 '20 at 23:20
Another excellent answer from ap1997 with great refinements from norok2 and wwii. Thanks. Another approach to the offset would be `cycle('ABCAB'[offset:offset+3])`, which would also avoid slicing the dataframe. – Julian7 Jun 04 '20 at 05:22

score 2 · Answer 2 · answered Jun 03 '20 at 19:20

2

What I will do

df['col']=(df.index%3).map({0:'A',1:'B',2:'C'})

answered Jun 03 '20 at 19:20

BENY

317,841
20
164
234

Perhaps `offset = 0; items = 'ABC'; n = len(items); {(i + offset) % n: x for i, x in enumerate(items)}` instead of `{0:'A',1:'B',2:'C'}`? – norok2 Jun 03 '20 at 19:52
YOBEN_S's answer is very clear and straightforward. norok2's comment a good way to change the start point with an offset. Thanks both. – Julian7 Jun 04 '20 at 05:16

Add a column to a pandas dataframe with A, B, C repeating

2 Answers2