Inputting new column data based of exsisting column data in a CSV file

Question

Hello I am still very new to Python, and am not very adapt with Programming.

My relevant data looks something like this:

Specimen	Fam_Genus
1	A
2	B
3	F
4	G
5	U
6	A
7	B
8	D

Just with about 4000 Specimens. Since the data is old the Genus is not up to modern standard as a lot of them are now underclasses of the Family Genus.

So I want to be able to create a new row in which, based of the Family Genius data the new Genius is displayed.

Like:

Specimen	Fam_Genus	Modern Genus
1	A	A
2	B	B
3	F	C
4	G	C
5	U	A
6	A	A
7	B	B
8	D	C

I tried :

df["Modern_Genus"] = ""
data['Modern_Genus'] = np.where(data.Fam_Genus.str.contains("A"), "A")

But I get this Error back: ValueError: either both or neither of x and y should be given

From what I found online, this seems to be the best way, but as I said I am new to python, especially numpy and panda. So any ideas or suggestions on what I am doing wrong?

score 0 · Answer 1 · answered Aug 13 '21 at 21:55

The np.where statement needs an input to what to provide in the case that your condition is met and another input to what to provide if the condition is not met. Try this instead:

data['Modern_Genus'] = np.where(data.Fam_Genus.str.contains("A"), "A", "B")

"B" is the value that will be provided if data.Fam_Genus does not contain "A".

score 0 · Answer 2 · answered Aug 13 '21 at 21:57

You can use replace with a dictionary of the new mappings:

df['Modern_Genus'] = df['Fam_Genus'].replace({'F': 'C',
                                              'G': 'C',
                                              'U': 'A',
                                              'D': 'C',
                                             })

output:

   Specimen Fam_Genus Modern_Genus
0         1         A            A
1         2         B            B
2         3         F            C
3         4         G            C
4         5         U            A
5         6         A            A
6         7         B            B
7         8         D            C

Inputting new column data based of exsisting column data in a CSV file

2 Answers2