0

I am working with a dataframe created by importing a .csv file I created. I want to (1) create a new column in the dataframe and (2) use values from an existing column to assign a value to the new column. This is an example of what I'm working with:

   date    id    height     gender
dd/mm/yyyy 1A      6          M
dd/mm/yyyy 2A      4          F
dd/mm/yyyy 1B      1          M
dd/mm/yyyy 2B      7          F

So I want to make a new column "side" and make that side have the value "A" or "B" based on the existing "id" column value:

   date    id    height     gender    side
dd/mm/yyyy 1A      6        M          A
dd/mm/yyyy 2A      4        F          A
dd/mm/yyyy 1B      1        M          B
dd/mm/yyyy 2B      7        F          B

I have gotten to a point where I have been able to make the new column and assign a new value but when I attempt to use the .groupby method on the "side" column it doesn't work as expected.

df = pd.read_csv("clean.csv")
df = df.drop(["Unnamed: 0"], axis=1)

df["side"] = ""

df.columns = ["date", "id", "height", "gender", "side"]

for i, row in df.iterrows():
    if "A" in row["id"]:
        df.at[i, row["side"]] = "A"
    else:
        df.at[i, row["side"]] = "B"

df["side"]

calling df["side"] results in blank output, but calling df by itself produces this: df shows a value in the side column

So there is a value in the dataframe, but using the .groupby method treats the values in the side column as not existing. This is a real headscratcher. I'm new to Python and would appreciate if someone could explain to me what I'm doing wrong.

m00saca
  • 363
  • 1
  • 7
  • 20

2 Answers2

3

Just use str[]. I could not see the image. If your id has more than 2 chars, you need this to get the last char

df['side'] = df.id.str[-1]


Out[582]:
         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • 1
    his example has some numbers in the double digits, so it might just be easier to do: `df['side'] = df.id.str[-1]` to only select the A or B. – d_kennetz Jun 10 '19 at 23:10
  • 2
    @d_kennetz: I don't have access to the image(the image doesn't show on my browser), so I base it on the sample he gave. That's why everyone hates the attach image! I need to grab my phone to see what you mean. – Andy L. Jun 10 '19 at 23:14
  • 2
    Sorry for using the attach image here I wanted to make what I was working with extra clear. But I am really liking the string method you are pointing out. I was able to use `df.id.str[-1]` because of my double digit numbers as suggested by @d_kennetz – m00saca Jun 10 '19 at 23:16
  • 1
    @m00saca: no worries. I just want to point out the reason why I didn't use `str[-1]` at the first place. – Andy L. Jun 10 '19 at 23:20
2

1. Series.str

df['id'].str[-1:]

         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B

Or to be safe and more general.

2. str.extract with regex:

df['side'] = df['id'].str.extract('([A-Za-z])')

         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B

3. Str.slice

df['side'] = df['id'].str.slice(start=-1)

         date  id  height gender side
0  dd/mm/yyyy  1A       6      M    A
1  dd/mm/yyyy  2A       4      F    A
2  dd/mm/yyyy  1B       1      M    B
3  dd/mm/yyyy  2B       7      F    B
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • Thanks for showing me all of the helpful string methods here. I am using the suggestion above from @d_kennetz. – m00saca Jun 10 '19 at 23:21