How to find an alphabet and extract the alphabet and the number tagged along with it in Pandas?

Question

I would like to create a new column in the data frame that will search for the alphabet in a column. Based on it, it will then search for the next number and copy the alphabet and number into newly extracted column. Example:

Month	Sem_Year
2020-04-01	H1 2020
2020-05-01	2020 H1
2020-06-01	H1 2020
2020-07-01	H2 2020
2020-08-01	H2 2020
2020-09-01	2020 H2
2020-10-01	2020 H2
2020-11-01	H2 2020
2020-12-01	H2 2020
2021-01-01	H1 2021
2021-02-01	H1 2021

Now I want to search for the alphabet H in the second column and extract the alphabet and number tagged along with it. Example:

Month	Sem_Year	Sem
2020-04-01	H1 2020	H1
2020-05-01	2020 H1	H1
2020-06-01	H1 2020	H1
2020-07-01	H2 2020	H2
2020-08-01	H2 2020	H2
2020-09-01	2020 H2	H2
2020-10-01	2020 H2	H2
2020-11-01	H2 2020	H2
2020-12-01	H2 2020	H2
2021-01-01	H1 2021	H1
2021-02-01	H1 2021	H1

Can you just split it and take the first item? Have you asked a question other than can someone do this for me? — wwii, Nov 22 '22 at 15:08
Yup, I have used a similar approach. But there are a few instances where the alphabet is not the first character. Example 2nd row, 6th and 7th row. So, splitting is resulting in an incorrect result. I apologize if you felt so. But I don't have that much experience with python and am starting to gain my understanding/ — Mufeez, Nov 22 '22 at 15:55
Modified answer to meet this requirement given below and does not depend on split() — user19077881, Nov 22 '22 at 16:50

user19077881 · Accepted Answer · 2022-11-22T16:28:10.800

1

For the varied formats you have defined you need to use a Regex expression. Note that H\d means H followed by a digit. This regex could be modified for other requirements.

df['Sem'] = df['Sem_year'].str.extract("(H\d)")

edited Nov 22 '22 at 16:28

answered Nov 22 '22 at 14:57

user19077881

3,643
2
3
14

Yup this works only if the location of "H" is fixed i.e. either at the start or end. I have edited the sample data frame. Can you please check and help me? – Mufeez Nov 22 '22 at 15:56
Answer modified to meet the newly stated requirement. – user19077881 Nov 22 '22 at 16:28
Can you also help me with extracting year like 2021 in a separate column using similar query please? – Mufeez Nov 23 '22 at 07:34
Currently, I am writing df['Year'] = df['Time Grain'].str.extract("(20\d\d)") similar to yours. – Mufeez Nov 23 '22 at 07:45
If your data is in DateTime format and not a string then you need to use a different approach. – user19077881 Nov 23 '22 at 10:23

score 0 · Answer 2 · answered Nov 22 '22 at 14:59

0

You can use df.insert() to add a new column. For extracting the alphabet, loop through the values (column_value) in the second column and use "value_for_new_column=column_value.split(' ')[0]"

answered Nov 22 '22 at 14:59

Neil

59
1
7

How to find an alphabet and extract the alphabet and the number tagged along with it in Pandas?

2 Answers2