0

I have data frame(df) consists of 47 columns and 30,000 rows, columns are belows

Index(['Unnamed: 0', 'CtpJobId', 'TransformJobStateId', 'LastError',
       'PriorityDate', 'QueuedTime', 'AccurateAsOf', 'SentToDevice',
       'StartedAtDevice', 'ProcessStart', 'LastProgressAt', 'ProcessEnd',
       'OutputFileDuration', 'Tags', 'SegmentId', 'VideoId',
       'ClipFirstFrameNumber', 'ClipLastFrameNumber', 'SourceId',
       'SourceNamedLocation', 'SourceDirectory', 'SourceFileSize',
       'srcMediaFormat', 'srcFrameRate', 'srcWidth', 'srcHeight', 'srcCodec',
       'srcDuration', 'TargetId', 'TargetNamedLocation', 'TargetDirectory',
       'TargetFilename', 'Description', 'TargetTags', 'tgtFrameRate',
       'tgtDropFrame', 'tgtWidth', 'tgtHeight', 'tgtCodec', 'DeviceType',
       'DeviceResourceId', 'AssignedDeviceId', 'DeviceName',
       'AssignedDeviceJobId', 'DeviceUri'],
      dtype='object')

I want to apply a function for selective column or that data frame to create a new column called df['seg_duration'], so my function is as below

def seq_duration(df):

    if ClipFirstFrameNumber is not None and ClipLastFrameNumber is not None:
        fn = ClipLastFrameNumber -ClipFirstFrameNumber
        if FrameRate =='23.98' and DropFrame == 'False' :
            fps = 24 / 1.001
        elif FrameRate == '24' and DropFrame == 'False':
            fps = 24
        elif FrameRate == '25'and DropFrame == 'False':
            fps = 25
        elif  FrameRate == '29.97':
            fps = 30 / 1.001
        elif  FrameRate == '30' and DropFrame == 'False':
            fps = 30
        elif FrameRate == '59.94':
            fps = 60 / 1.001
        Duration = fn/fps

    elif srcDuration is not None:
         Duration = srcDuration
    else:
        None

The function is actually have 3 case and in one case have many conditions, so first i have subtract the value from ClipLastFrameNumber to ClipFirstframeNumber columns and save it to fn variable. and aplly other logic, same as srcDuration is column and its value. such as below

ClipLastFrameNumber ClipFirstFrameNumber    tgtDropFrame    tgtFrameRate
NaN                    NaN                    True          29.97
NaN                    NaN                    True          29.97
NaN                    NaN                    True          29.97
34354.0                28892.0                True          29.97

When I apply this function as below

df['seg_duration']=df.apply(seq_duration)

I am getting error NameError: ("name 'ClipFirstFrameNumber' is not defined", 'occurred at index Unnamed: 0')

Is that right way to write function for pandas or how do I use this function to that data frame and achieve my goal to create a new column df['seg_dur'] based on that function. Thanks in advance

M Hossain
  • 77
  • 1
  • 2
  • 10
  • Multiple things going on here.````seq_duration```` needs to be defined for a row, not a dataframe; it also needs to return something at the end. Then you want to apply the function with ````axis = 1```` passed to ````apply````. – xyzjayne Jul 12 '18 at 16:37

1 Answers1

1

Modifying your function a little:

def seq_duration(row):
    Duration = None
    if row.ClipFirstFrameNumber is not None and row.ClipLastFrameNumber is not None:
        fn = row.ClipLastFrameNumber -row.ClipFirstFrameNumber
        fps = 0
        if row.FrameRate =='23.98' and row.DropFrame == 'False' :
            fps = 24 / 1.001
        elif row.FrameRate == '24' and row.DropFrame == 'False':
            fps = 24
        elif row.FrameRate == '25'and row.DropFrame == 'False':
            fps = 25
        elif  row.FrameRate == '29.97':
            fps = 30 / 1.001
        elif  row.FrameRate == '30' and row.DropFrame == 'False':
            fps = 30
        elif row.FrameRate == '59.94':
            fps = 60 / 1.001
        if fps>0:
            Duration = fn/fps

    elif row.srcDuration is not None:
         Duration = row.srcDuration

    return Duration

Then you want:

df['seg_duration']=df.apply(seq_duration,axis = 1)
xyzjayne
  • 1,331
  • 9
  • 25
  • thank you so much for your quick response and made correction of my function, I really appreciate, Btw one I execute that df['seg_duration']=df.apply(seq_duration,axis = 1), i ma getting below error, UnboundLocalError: ("local variable 'fps' referenced before assignment", 'occurred at index 0'), Any Idea?? please your help will save my day. – M Hossain Jul 12 '18 at 17:04
  • It's a case where ````fps```` is not defined but ````fn```` is defined... Are you sure your conditions have covered all the cases? – xyzjayne Jul 12 '18 at 17:06
  • Yes, that function should have only three cases, first if row.ClipFirstFrameNumber is not None and row.ClipLastFrameNumber is not None then calculate the duration from this para, elif row.srcDuration is not None and row.ClipFirstFrameNumber is None and row.ClipLastFrameNumber is None, if those two cases not true and all thses three is none then duration is none. – M Hossain Jul 12 '18 at 17:43
  • Now its giving error saying UnboundLocalError: ("local variable 'Duration' referenced before assignment", 'occurred at index 0') – M Hossain Jul 12 '18 at 17:50
  • Eck sorry, fixed it. I put a double equal as opposed to a single equal sign. – xyzjayne Jul 12 '18 at 17:52
  • now its giving the old error, UnboundLocalError: ("local variable 'fps' referenced before assignment", 'occurred at index 0') – M Hossain Jul 12 '18 at 17:56
  • Added a line to define fps as 0. Try again – xyzjayne Jul 12 '18 at 17:58
  • It worked ! only fps = 1.00(arbitrary) because its a floating number. Thank you so much @ xyzjayne – M Hossain Jul 12 '18 at 18:40
  • I don't know why its not getting from second case, if ClipFirstFrameNumber and ClipLastFrameNumber is null then, duration should be srcDuration, but its not taking that value, something wrong with my function, can you please have a lok one more time?@xyzjayne thanks – M Hossain Jul 12 '18 at 18:50
  • Do you mean when they are not NaN as opposed to None? They are different. https://stackoverflow.com/questions/944700/how-can-i-check-for-nan-in-python – xyzjayne Jul 12 '18 at 18:56
  • right. when clipFirstFrameNumber and ClipLastFrameNumber both none, it should take the srcDuration as Duration and srcDuration is present not none but it is giving Nan – M Hossain Jul 12 '18 at 19:10
  • 1
    so instead of using is not None you need to check whether they are NaN – xyzjayne Jul 12 '18 at 19:11