0

Is there a way to create "suffixed-by-number" variables/columns in a Python data frame?

I am trying to replicate some of SAS' "first.", "last.", and "retain" capabilities in Python, but have not been successful. This is what my data set looks like:

Name     Grade
Lee      A
Lee      A+
Lee      A+
Lee      A+
Col      B+
Col      A
Col      B+

I would like to summarize the above information so that I only output one record per Name. This is the desired output:

Name    _Grade01    _Grade02    _Grade03    _Grade04
Lee     A            A+         A+          A+
Col     B+           A          B+

In SAS, the code would be:

array _Grade{*} _Grade01-_Grade04;
if first.Name then do;
    n = 0;
    do _i = 1 to 4;
        _Grade[_i] = 0;
    end;
end;

n + 1;

if n>0 and n <=4 then do;
    _Grade[n] = Grade;
end;

if last.Name then do;
    output;
end;

Is there a way to efficiently replicate this in Python?

Thank you.

  • Python supports lists. You don't need `Grade1` to be defined as its own name when you can use `Grade[1]` (which is a syntax whereby `Grade` is the name of a list, and `1` is the index used to access a position in that list). The syntax for dicts is very similar. – Charles Duffy Sep 07 '22 at 19:39
  • Don't do this. Do not dynamically create variables, use a *container*, like a list or a dict – juanpa.arrivillaga Sep 07 '22 at 19:44
  • 1
    I'm not sure why they didn't just use `proc transpose`. I'm sure you can recreate this with `pivot` or `pivot_table`. – Stu Sztukowski Sep 07 '22 at 20:34

0 Answers0