4

I have multiple column data frame with columns ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'].

In the energy supply column, I want to convert the unit of the column to Peta from Giga. But in the process energy['Energy Supply']*= energy['Energy Supply'], when the value is like "...." (missing value is denoted by this), is also getting multiplied or say duplicated. Also, the string value in the column is also getting multiplied. (For eg original: Peta, after operation: PetaPetaPetaPeta...).

To stop this from happening, I am running this:

energy = pd.read_excel("Energy Indicators.xls",skiprows = 16, skip_footer = 38)
energy.drop(['Unnamed: 0','Unnamed: 1'],axis = 1, inplace = True)
energy.columns = ['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
for i in energy['Energy Supply']:
    if (isinstance(energy[i],int) == True):
        energy['Energy Supply'][i]=energy['Energy Supply'][i]*1000000
return (energy)

But I am not getting the result i.e. to change the value of integer type variables only, and nothing is changing.

Where I think the problem lies in, the first two rows will give thefalse condition, as first rows are "String" and based on that, the program is not modifying the values, whereas I want to individually check if the value is of integer type and if it is, Multiplies the number by 1,000,000.

Input:

    Country        Energy Supply    Energy Supply per Capita    % Renewable
0   NaN             Petajoules            Gigajoules                 %
1   Afghanistan        321                   10                  78.6693
2   Albania            102                   35                    100
3   Algeria            1959                  51                  0.55101
4   American Samoa      ...                 ...                  0.641026

Expected Output:

    Country        Energy Supply    Energy Supply per Capita    % Renewable
0   NaN             Petajoules            Gigajoules                 %
1   Afghanistan        3210000                10                     78.6693
2   Albania            1020000                35                      100
3   Algeria            19590000               51                     0.55101
4   American Samoa      ...                 ...                    0.641026

Current Output:

    Country        Energy Supply    Energy Supply per Capita    % Renewable
0   NaN             PetajoulesPeta.         Gigajoules               %
1   Afghanistan        3210000                10                   78.6693
2   Albania            1020000                35                    100
3   Algeria            19590000               51                   0.55101
4   American Samoa      ........                ...                0.641026
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
Sajal
  • 89
  • 1
  • 14
  • Example input and desired output please. – John Zwinck Oct 21 '18 at 07:58
  • Can you provide some input data and desired output? You can read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples/20159305#20159305) – Sai Kumar Oct 21 '18 at 07:59
  • @JohnZwinck Please check, I did the necessary changes – Sajal Oct 21 '18 at 08:11

2 Answers2

2

You can use str.isnumeric to check if a string is numeric and then multiply.

energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: int(x) * 1000000 if str(x).isnumeric() else x)

print (energy)

    Country         Energy Supply   Energy Supply per Capita    % Renewable
0             NaN    Petajoules           Gigajoules                     %
1     Afghanistan    321000000                10                   78.6693
2         Albania    102000000                35                       100
3         Algeria    1959000000               51                   0.55101 
4  American Samoa        ...                  ..                  0.641026
Abhi
  • 4,068
  • 1
  • 16
  • 29
  • Thanks for your reply, but when tried this code, it is showing that int object has no attribute name 'isnumeric', `AttributeError: 'int' object has no attribute 'isnumeric'`. Any idea about this? – Sajal Oct 21 '18 at 09:30
  • Thanks a lot, it is working now, but can you explain why another answer (on this post) is not working and what you have done is working. As far as i can understand you are iterating through the column and converting all of the values to string, and then checking if the string is numerical or not – Sajal Oct 21 '18 at 09:34
  • @Sajal I don't think there's a method called `is_integer()` available for strings. Where as `Series.str.isnumeric` checks each row in the series for numeric and If found multiply it accordingly. – Abhi Oct 21 '18 at 09:40
1

This worked for me with a million values:

import pandas as pd
import numpy as np 

data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}

energy = pd.DataFrame(data)

input:

Energy Supply
0                   1
1                  30
2          Petajoules
3                   5
4                  70
5                   1
6                  30
7          Petajoules
8                   5
9                  70
10                  1
11                 30
12         Petajoules
13                  5
14                 70
15                  1
16                 30
17         Petajoules
18                  5
19                 70
20                  1
21                 30
22         Petajoules
23                  5
24                 70
25                  1
26                 30
27         Petajoules
28                  5
29                 70
              ...
[10000000 rows x 1 columns]

Then i transform the Series into an array and set the values:

arr = energy["Energy Supply"].values

for i in range(len(arr)):
    if isinstance(arr[i],int):
        arr[i] = arr[i]*1000000
    else:
        pass

The output looks like this:

        Energy Supply
0             1000000
1            30000000
2          Petajoules
3             5000000
4            70000000
5             1000000
6            30000000
7          Petajoules
8             5000000
9            70000000
10            1000000
11           30000000
12         Petajoules
13            5000000
14           70000000
15            1000000
16           30000000
17         Petajoules
18            5000000
19           70000000
20            1000000
21           30000000
22         Petajoules
23            5000000
24           70000000
25            1000000
26           30000000
27         Petajoules
28            5000000
29           70000000
              ...
[10000000 rows x 1 columns]

This solution is about twice as fast as an apply:

Looping through an array:

loop: 100%|██████████| 10000000/10000000 [00:07<00:00, 1376439.75it/s]

Using Apply:

apply: 100%|██████████| 10000000/10000000 [00:14<00:00, 687420.00it/s]

If you convert the series to numeric then the string values become nan values. Using np.where you need about 5 seconds for both converting the series to numeric and multiplying the values:

import pandas as pd
import numpy as np 
import time

data = {"Energy Supply":[1,30,"Petajoules",5,70]*2000000}

energy = pd.DataFrame(data)
t = time.time()

energy["Energy Supply"] = pd.to_numeric(energy["Energy Supply"],errors="coerce")

energy["Energy_Supply"] = np.where((energy["Energy Supply"]%1==0),energy["Energy Supply"]*100,energy["Energy Supply"])
t1 = time.time()
print(t1-t)
5.275099515914917

But you could also simply do this after using pd.to_numeric():

energy["Energy Supply"] = energy["Energy Supply"]*1000000
Lukas Humpe
  • 421
  • 3
  • 10
  • I tried to run this. `AttributeError: 'Series' object has no attribute 'is_integer'`. I do not think is_integer works on this. – Sajal Oct 21 '18 at 08:15
  • I tried `energy['Energy Supply'] = np.where(isinstance(energy['Energy Supply'],int), energy['Energy Supply']*1000000, energy['Energy Supply'])` , but still nothing is happening – Sajal Oct 21 '18 at 08:26
  • Can you do the same by using the `.where` method? – Sajal Oct 21 '18 at 09:40
  • That is only possible if you drop the string values, which if they do not contain information you could drop anyway. – Lukas Humpe Oct 21 '18 at 09:49