-1

I would like to create a new column in the data frame that will search for the alphabet in a column. Based on it, it will then search for the next number and copy the alphabet and number into newly extracted column. Example:

Month Sem_Year
2020-04-01 H1 2020
2020-05-01 2020 H1
2020-06-01 H1 2020
2020-07-01 H2 2020
2020-08-01 H2 2020
2020-09-01 2020 H2
2020-10-01 2020 H2
2020-11-01 H2 2020
2020-12-01 H2 2020
2021-01-01 H1 2021
2021-02-01 H1 2021

Now I want to search for the alphabet H in the second column and extract the alphabet and number tagged along with it. Example:

Month Sem_Year Sem
2020-04-01 H1 2020 H1
2020-05-01 2020 H1 H1
2020-06-01 H1 2020 H1
2020-07-01 H2 2020 H2
2020-08-01 H2 2020 H2
2020-09-01 2020 H2 H2
2020-10-01 2020 H2 H2
2020-11-01 H2 2020 H2
2020-12-01 H2 2020 H2
2021-01-01 H1 2021 H1
2021-02-01 H1 2021 H1
Mufeez
  • 55
  • 5
  • Can you just split it and take the first item? Have you asked a question other than can someone do this for me? – wwii Nov 22 '22 at 15:08
  • Yup, I have used a similar approach. But there are a few instances where the alphabet is not the first character. Example 2nd row, 6th and 7th row. So, splitting is resulting in an incorrect result. I apologize if you felt so. But I don't have that much experience with python and am starting to gain my understanding/ – Mufeez Nov 22 '22 at 15:55
  • Modified answer to meet this requirement given below and does not depend on split() – user19077881 Nov 22 '22 at 16:50

2 Answers2

1

For the varied formats you have defined you need to use a Regex expression. Note that H\d means H followed by a digit. This regex could be modified for other requirements.

df['Sem'] = df['Sem_year'].str.extract("(H\d)")
user19077881
  • 3,643
  • 2
  • 3
  • 14
0

You can use df.insert() to add a new column. For extracting the alphabet, loop through the values (column_value) in the second column and use "value_for_new_column=column_value.split(' ')[0]"

Neil
  • 59
  • 1
  • 7