-3

I have a pandas dataframe from a csv file and i want to add 3 columns in python 3.8

  1. add a column and convert meters to miles (length is meters, new column will be length_miles).

  2. add column to convert meters to feet (elevation_gain is in meters, new column will be elevation_gain_feet.

  3. add a column that computes a difficulty rating as follows: nps difficulty rating = Elevation Gain(feet) x 2 x distance (in miles). The product's square root is the numerical rating.

This needs to be broken down a little further into a difficulty rating of 1-5. the current difficulty rating in the data set is not informative so i want to use the national park service rating.

if the numerical difficulty rating is:

under 50, then the value is 1 50-100, then difficulty rating is 2 101-150, then difficulty rating is 3 151-200, then difficulty rating is 4 above 200, then difficulty rating is 5

Ideally this would compute and just put the number 1-5 in the column, but having 2 new columns for #3 would be fine as well.

Here are the columns from my dataframe and values from a couple rows. I have not yet thought about making the nps 1-5 ratings in the dataframe, I am not sure if I can, or need to do it outside the dataframe in a function. unfortunately it does not seem to be adding the columns like I want it to, so I think I must be doing something wrong. Dataframe code I have so far

df = pd.read_csv('data.csv')
df.assign(length_miles = lambda x: x['length'] * 0.00062137, axis = 1)
df.assign(elevation_gain_ft = lambda x: x['elevation_gain'] * 3.28084, axis = 1)
df.assign(num_dif_rating = lambda x: np.sqrt( x['length_miles'] * 2 * x['elevation_gain_ft'], axis = 1))
hiker
  • 163
  • 1
  • 1
  • 8
  • 1
    show a minimum reproducible example https://stackoverflow.com/help/minimal-reproducible-example – ombk Nov 27 '20 at 22:20
  • Welcome to stack overflow, please consider checking out [what topics can you ask about here](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask), SO is for solving specific problems with programming, you should provide a code you have tried so far, describe problems, showing effort in trying to solve it yourself at first (what have you read before etc.), hope this explains why people downvote your question. – Ruli Nov 27 '20 at 23:04
  • sure, let me know if this is better, since i am still having issues with this maybe it will help me resolve it. – hiker Nov 28 '20 at 01:59
  • @hiker yes now it looks way better, yet if possible do not upload images of data, check out [how to create minimal reproducible example from pandas DF](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Ruli Nov 28 '20 at 09:25

2 Answers2

1

You need to use the assign method:

df.assign(YourColumn = lambda x: conversion_formula(x['Meters']), axis = 1)

Here's the link to the documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html

Good luck!

Vitmz
  • 21
  • 1
0

I got it to work like this. it cleaned up the data just the way i need.

def data_cleanup():
df = pd.read_csv('AllTrails data.csv')
# convert meters to miles and feet and add columns
df['length_miles']=df['length'].apply(lambda x : x*0.000621371)
df['elevation_gain_feet']=df['elevation_gain'].apply(lambda x : x*3.28084)
def difficulty_rating(x, y):
    res = np.sqrt(x * y * 2)
    if res < 50:
        return 1
    elif res >= 50 and res <= 100:
        return 2
    elif res >= 101 and res <= 150:
        return 3
    elif res >= 151 and res <= 200:
        return 4
    else:
        return 5
df['nps_difficulty_rating'] = df.apply(lambda x: difficulty_rating (x.length_miles, x.elevation_gain_feet), axis=1)

df.to_csv('np trails.csv')

data_cleanup()

hiker
  • 163
  • 1
  • 1
  • 8