0

How can I add a column to a pandas dataframe with values 'A', 'B', 'C', 'A', 'B' etc? i.e. ABC repeating down the rows. Also I need to vary the letter that is assigned to the first row (i.e. it could start ABCAB..., BCABC... or CABCA...).

I can get as far as:

df.index % 3

which gets me the index as 0,1,2 etc, but I cannot see how to get that into a column with A, B, C.

Many thanks,

Julian

sakurashinken
  • 3,940
  • 8
  • 34
  • 67
Julian7
  • 191
  • 1
  • 12
  • So, each row should contain a string of these three letters randomized... right? – Anwarvic Jun 03 '20 at 18:48
  • Use itertools.cycle to generate the values. – wwii Jun 03 '20 at 18:48
  • You want to `pivot` if I understand you correctly. See the question and answer [here](https://stackoverflow.com/questions/28337117/how-to-pivot-a-dataframe-in-pandas) – Erfan Jun 03 '20 at 18:49
  • 2
    Your question is unclear. Please provide a [mcve] including sample input and expected output, according to [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Jun 03 '20 at 18:53

2 Answers2

2

If I've understood your question correctly, you can create a list of the letters as follows, and then add that to your dataframe:

from itertools import cycle
from random import randint

letter_generator = cycle('ABC')
offset = randint(0, 2)
dataframe_length = 10 # or just use len(your_dataframe) to avoid hardcoding it
column = [next(letter_generator) for _ in range(dataframe_length + offset)]
column = column[offset:]
ap1997
  • 183
  • 6
  • This would be much more efficient if you were to discard a number of elements according to `offset` and generate `column` without any slicing, and avoiding repeated `next()` calls by using the `zip(range())` idiom: `[x for _, x in zip(range(dataframe_length), letter_generator)]` – norok2 Jun 03 '20 at 19:15
  • Assuming an exisiting DataFrame: No need to use `range` - `[next(letter_generator) for _ in df.iloc[0]]` – wwii Jun 03 '20 at 19:15
  • @wwii still calling `next()` a lot, while `zip()` would be much faster. – norok2 Jun 03 '20 at 19:18
  • @norok2 1.5 seconds vs 1.1 seconds for ten million items. – wwii Jun 03 '20 at 23:20
  • Another excellent answer from ap1997 with great refinements from norok2 and wwii. Thanks. Another approach to the offset would be `cycle('ABCAB'[offset:offset+3])`, which would also avoid slicing the dataframe. – Julian7 Jun 04 '20 at 05:22
2

What I will do

df['col']=(df.index%3).map({0:'A',1:'B',2:'C'})
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Perhaps `offset = 0; items = 'ABC'; n = len(items); {(i + offset) % n: x for i, x in enumerate(items)}` instead of `{0:'A',1:'B',2:'C'}`? – norok2 Jun 03 '20 at 19:52
  • YOBEN_S's answer is very clear and straightforward. norok2's comment a good way to change the start point with an offset. Thanks both. – Julian7 Jun 04 '20 at 05:16