Add a new column to a dataframe based on an existing column value using pandas

Question

I am working with a dataframe created by importing a .csv file I created. I want to (1) create a new column in the dataframe and (2) use values from an existing column to assign a value to the new column. This is an example of what I'm working with:

   date    id    height     gender
dd/mm/yyyy 1A      6          M
dd/mm/yyyy 2A      4          F
dd/mm/yyyy 1B      1          M
dd/mm/yyyy 2B      7          F

So I want to make a new column "side" and make that side have the value "A" or "B" based on the existing "id" column value:

   date    id    height     gender    side
dd/mm/yyyy 1A      6        M          A
dd/mm/yyyy 2A      4        F          A
dd/mm/yyyy 1B      1        M          B
dd/mm/yyyy 2B      7        F          B

I have gotten to a point where I have been able to make the new column and assign a new value but when I attempt to use the .groupby method on the "side" column it doesn't work as expected.

df = pd.read_csv("clean.csv")
df = df.drop(["Unnamed: 0"], axis=1)

df["side"] = ""

df.columns = ["date", "id", "height", "gender", "side"]

for i, row in df.iterrows():
    if "A" in row["id"]:
        df.at[i, row["side"]] = "A"
    else:
        df.at[i, row["side"]] = "B"

df["side"]

calling df["side"] results in blank output, but calling df by itself produces this:

So there is a value in the dataframe, but using the .groupby method treats the values in the side column as not existing. This is a real headscratcher. I'm new to Python and would appreciate if someone could explain to me what I'm doing wrong.

Andy L. · Accepted Answer · 2019-06-10T23:18:48.737

3

Just use str[]. I could not see the image. If your id has more than 2 chars, you need this to get the last char

df['side'] = df.id.str[-1]


Out[582]:
         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B

edited Jun 10 '19 at 23:18

answered Jun 10 '19 at 23:06

Andy L.

24,909
4
17
29

1

his example has some numbers in the double digits, so it might just be easier to do: `df['side'] = df.id.str[-1]` to only select the A or B. – d_kennetz Jun 10 '19 at 23:10
2

@d_kennetz: I don't have access to the image(the image doesn't show on my browser), so I base it on the sample he gave. That's why everyone hates the attach image! I need to grab my phone to see what you mean. – Andy L. Jun 10 '19 at 23:14
2

Sorry for using the attach image here I wanted to make what I was working with extra clear. But I am really liking the string method you are pointing out. I was able to use `df.id.str[-1]` because of my double digit numbers as suggested by @d_kennetz – m00saca Jun 10 '19 at 23:16
1

@m00saca: no worries. I just want to point out the reason why I didn't use `str[-1]` at the first place. – Andy L. Jun 10 '19 at 23:20

score 2 · Answer 2 · answered Jun 10 '19 at 23:09

1. `Series.str`

df['id'].str[-1:]

         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B

Or to be safe and more general.

2. `str.extract` with `regex`:

df['side'] = df['id'].str.extract('([A-Za-z])')

         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B

3. `Str.slice`

df['side'] = df['id'].str.slice(start=-1)

         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B

Thanks for showing me all of the helpful string methods here. I am using the suggestion above from @d_kennetz. — m00saca, Jun 10 '19 at 23:21

Add a new column to a dataframe based on an existing column value using pandas

2 Answers2

1. Series.str

2. str.extract with regex:

3. Str.slice

1. `Series.str`

2. `str.extract` with `regex`:

3. `Str.slice`