I am trying to take row means of every few columns. Here is a sample dataset.
d = {'2000-01': range(0,10), '2000-02': range(10,20), '2000-03': range(10,20),
'2001-01': range(10,20), '2001-02':range(5,15), '2001-03':range(5,15)}
pd.DataFrame(data=d)
2000-01 2000-02 2000-03 2001-01 2001-02 2001-03
0 0 10 10 10 5 5
1 1 11 11 11 6 6
2 2 12 12 12 7 7
3 3 13 13 13 8 8
4 4 14 14 14 9 9
5 5 15 15 15 10 10
6 6 16 16 16 11 11
7 7 17 17 17 12 12
8 8 18 18 18 13 13
9 9 19 19 19 14 14
I need to take row means of the first three columns and then the next three and so on in the complete dataset. I don't need the original columns in the new dataset. Here is my code. It works but with caveats (discussed below). I am searching for a cleaner, more elegant solution if possible. (New to Python/Pandas)
#Create empty list to store row means
d1 = []
#Run loop to find row means for every three columns
for i in np.arange(0, 6, 3):
data1 = d.iloc[:,i:i+3]
d1.append(data1.mean(axis=1))
#Create empty list to concat DFs later
dlist1 =[]
#Concat DFs
for j in range(0,len(d1)):
dlist1.append(pd.Series(d1[j]).to_frame())
pd.concat(dlist1, axis = 1)
I get this output, which is correct:
0 0
0 6.666667 6.666667
1 7.666667 7.666667
2 8.666667 8.666667
3 9.666667 9.666667
4 10.666667 10.666667
5 11.666667 11.666667
6 12.666667 12.666667
7 13.666667 13.666667
8 14.666667 14.666667
9 15.666667 15.666667
The columns names can easily be fixed, but the problem is that I need them in a specific format and I have 65 of these columns in the actual dataset. If you'll notice the column names in the original dataset, they are '2000-01'; '2000-02'; '2000-03'
. The 1,2 and 3 are months of the year 2000, therefore column 1 of the new df should be '2000q1'
, q1 being quarter 1. How do I loop over column names to create this for all my new columns? This seems significantly more challenging (at least to me!) than what's shown here. Thanks for your time!
EDIT: Ok this has been solved, quick shoutout to everyone who contributed!