0

I have the following data frame:

Data frame screenshot

and I want to convert it to this format:

enter image description here

I have succeed that with get_dummies but i am trying to do the same with defining a function like:

def func():
    if per_country['confirmed']=='confirmed':
        return per_country['cases']
    else:
        return 0

per_country['virus_confirmed']=per_country['type'].apply(func)

The per_country dataframe is the first screenshot.

But i am getting this error:

enter image description here

What am i doing wrong?

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • There is no `cases` column in your original data frame, so your function would fail even if it were written correctly. – shadowtalker Mar 27 '20 at 15:25
  • You have the error : `TypeError : func() takes 0 positional arguments but 1 given` When you run `apply` (last line), with `func` as argument, that means that you give to `func` one argument. – Catalina Chircu Mar 28 '20 at 07:03

3 Answers3

0

When you use apply, it's implicitly passing each record of per_country['type'] to func. If you want to do this more simply and clearer, you can use a lambda function.

per_country['virus_confirmed'] = per_country.apply(lambda x: x['cases'] if x['type']=="confirmed" else 0, axis=1)

EDIT: For the apply function, note that you're applying it to a series in the DataFrame. This means that x is each record from that column and that you don't need to specify the axis. I applied it to the whole DataFrame, which means I do need to specify the axis. With axis=1, x in my lambda function stands in for each row of the dataframe.

Lastly, I also forgot to mention that you can get this done even quicker with the dummy variables by doing:

per_country['virus_confirmed'] = per_country['cases']*per_country['confirmed']

Because columns that aren't confirmed are marked with a 0, it will give you the same result, but in a vectorized manner. If you're working with a Covid-19 dataset, I don't think you'll notice a huge difference, but it's a good habit to look for opportunities to vectorize.

LTheriault
  • 1,180
  • 6
  • 15
0

func is a function that takes 0 parameters. apply passes data to the first parameter of the function. This obviously fails because func does not accept any parameters.

You seem confused about the basics of Python scoping. You can consult here for an introduction.

In short, you are trying to write a function that is applied to each row of the data frame. Therefore your function func must accept a row as its parameter.

# Each row will be passed as the "x" parameter

def func(x):
    if x['confirmed'] == 'confirmed':
        return x['cases']
    else:
        return 0

# axis=1 is required to apply the function row-wise

per_country['virus_confirmed'] = per_country.apply(func, axis=1)

That said, the function you're asking about doesn't seem related to the data you're showing, so this whole question is a bit confusing.

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
0

Just on another note : A one hot encoding is appropriate for categorical data where no relationship exists between categories. It involves representing each categorical variable with a binary vector that has one element for each unique label and marking the class label with a 1 and all other elements 0 which is the one you are trying. The scikit-learn library provides the OneHotEncoder to automatically one hot encode one or more variables.

Nandu Raj
  • 2,072
  • 9
  • 20