1

This is my dataframe

I have tried this but it didn't work:

df1['quarter'].str.contains('/^[-+](20)$/', re.IGNORECASE).groupby(df1['quarter'])

Thanks in advance

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

Hi and welcome to the forum! If I understood your question correctly, you want to form groups per year?

Of course, you can simply do a group by per year as you already have the column.

Assuming you didn't have the year column, you can simply group by the whole string except the last 2 characters of the quarter column. Like this (I created a toy dataset for the answer):

import pandas as pd

d = {'quarter' : pd.Series(['1947q1', '1947q2', '1947q3', '1947q4','1948q1']), 
 'some_value' : pd.Series([1,3,2,4,5])}

df = pd.DataFrame(d)
df

This is our toy dataframe:

quarter     some_value
0   1947q1  1
1   1947q2  3
2   1947q3  2
3   1947q4  4
4   1948q1  5

Now we simply group by the year, but we substract the last 2 characters:

grouped = df.groupby(df.quarter.str[:-2])

for name, group in grouped:
    print(name)
    print(group, '\n')

Output:

1947
  quarter  some_value
0  1947q1           1
1  1947q2           3
2  1947q3           2
3  1947q4           4 

1948
  quarter  some_value
4  1948q1           5 

Additional comment: I used an operation that you can always apply to strings. Check this, for example:

s = 'Hi there, Dhruv!'

#Prints the first 2 characters of the string
print(s[:2])
#Output: "Hi"


#Prints everything after the third character
print(s[3:])
#Output: "there, Dhruv!"

#Prints the text between the 10th and the 15th character
print(s[10:15])
#Output: "Dhruv"
Guillermo Mosse
  • 462
  • 2
  • 14