0

Hello below please view my VBA code that works. I am trying to re-write it in Pandas, but my Pandas script is not working half way in ( My Pandas attempted script is below the VBA) Could anyone help me finish this off if possible ( which I think it is)

Sub mymacro()
Columns(19).Replace "DFHD", "SFD"
Columns(19).Replace "DFBG", "SFD"
Columns(19).Replace "DFVD", "SFD"
Columns(19).Replace "MFUB", "BFD"
Columns(19).Replace "MFBD", "BFD"
Columns(19).Replace "DFBD", "BFD"
Columns(19).Replace "UFNC", "CFD"
Columns(19).Replace "UFNC", "CFD"
Columns(19).Replace "BFYD", "BFD"
'Having trouble starting below here'
Columns("T:AC").Select
    Selection.EntireColumn.Hidden = True
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:=Array( _
        "U*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-100
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=30, Criteria1:=Array( _
        "350", "B*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-100
    Range("S3").Select
    ActiveCell.FormulaR1C1 = "BD"
    Range("S3").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S3").Select
    Application.CutCopyMode = False
    ActiveSheet.ShowAllData
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:="=UND", Operator:=xlOr, Criteria2:="=UNH"
    ActiveWindow.SmallScroll Down:=-21
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=30, Criteria1:=Array( _
     "DR9", "DV0", "DV5", "DV8", "DV9", "DVG", "DV*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-36
    Range("S11").Select
    ActiveCell.FormulaR1C1 = "SD"
    Range("S11").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S11").Select
    Application.CutCopyMode = False
    ActiveSheet.ShowAllData
    ActiveWindow.SmallScroll Down:=-10
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:="UNH"
    ActiveWindow.SmallScroll Down:=-27
    Range("S1815").Select
    ActiveCell.FormulaR1C1 = "FUHD"
    Range("S1815").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S1815").Select
    Application.CutCopyMode = False
    ActiveWindow.SmallScroll Down:=-30
    ActiveSheet.ShowAllData
    ActiveWindow.SmallScroll Down:=-240

Below is my Pandas script, note where I start having trouble is commented , because the first 12 lines of code work great.

import pandas as pd
import numpy as np
data = pd.read_excel("orsthrufirstarticledeltion.xlsx", encoding = "ISO-8859-1", dtype=object)
data.loc[data.Format == 'DFHD', 'Format'] = 'SFD'
data.loc[data.Format == 'DFBG', 'Format'] = 'SFD'
data.loc[data.Format == 'DFVD', 'Format'] = 'SFD'
data.loc[data.Format == 'MFUB', 'Format'] = 'BFD'
data.loc[data.Format == 'MFBD', 'Format'] = 'BFD'
data.loc[data.Format == 'DFBD', 'Format'] = 'BFD'
data.loc[data.Format == 'UFNC', 'Format'] = 'CFD'
data.loc[data.Format == 'BFYD', 'Format'] = 'BFD'

# Trouble starts below
data.loc[(data["Fmt"] != str) & (data["Format"] == "UN*"), "Format"] = 'BD' # the UN* did not work 
#data.loc[(data["Fmt"] == '350') & (data["Format"] == "UNB"), "Format"] = 'BD'
#data.loc[(data["Fmt"] != str) & (data[data.Format.str.startswith('UN',na=False)]), "Format"] = 'BD'
#
writer = pd.ExcelWriter('mstrplc2.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1')
writer.save()

----- New Attempt At Getting A Solution ---------

Below please view the sample dataframe with the raw data we will start with, I have the code to export into excel if you wish.

import pandas as pd

startdf = pd.DataFrame({'Column_A':['DFHD', 'DFBG', 'DFVD', 'MFUB', 'MFBD', 'DFBD', 'UFNC', 'UFNC', 'BFYD',
                                    'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],

'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})



writer = pd.ExcelWriter('testdf.xlsx', engine='xlsxwriter')
    startdf.to_excel(writer, sheet_name='Sheet1')

The First step will be to take all of the values in Column A and replace the existing values with the new values as listed below (so were just editing Column A)

  • "DFHD" -> "SFD" "DFBG"->"SFD" "DFVD"-> "SFD" "MFUB"-> "BFD" "MFBD"-> "BFD" "DFBD"->"BFD" "UFNC"-> "CFD" "UFNC"->"CFD" "BFYD"-> "BFD"

After writing in this logic the data should look like this:

df2 = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                            'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

Now we will continue to edit just Column A, but use value in Column B to dictate what Column A values should be, so think of each value row by row. First filter out SFD, BFD and CFD from Column A, so the remaining values will be 'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'. For these remaining values, we will look to column B to decide how to change what is in column A. Logic below:

  1. Values starting with B or that are a number in Column B, should mean that the matching row value in Column A should now change to BFD
  2. Values starting with D or OPT in Column B should mean that the matching row value in Column A should now change to SFD
  3. Values starting with U or that are number in Column B, should mean that the matching row value in Column A should now change to UHFD

After this logic the final output dataframe should be

     resultdf = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                                     'BFD', 'SFD', 'SFD', 'SFD', 'SFD', 'BFD','UHFD', 'UHFD', 'BFD', 'BFD', 'BFD', 'SFD','BFD', 'UHFD', 'SFD'],
    'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

writer = pd.ExcelWriter('finalresult.xlsx', engine='xlsxwriter')
        resultdf.to_excel(writer, sheet_name='Sheet1')
0004
  • 1,156
  • 1
  • 14
  • 49
  • A large proportion of your vba code can be removed. I assume you used the macro recorder as you have a lot of superfluous actions. You also want to see [this](https://stackoverflow.com/questions/10714251/how-to-avoid-using-select-in-excel-vba) for removing your use of .Select – QHarr Feb 23 '19 at 00:20
  • @QHarr , I believe that, but how about translating it to Pandas, the second portion. I am trying to never use that VBA again, so not to worried about the sperfluous VBA – 0004 Feb 23 '19 at 00:22
  • 1
    Removing unneeded code from the VBA script in your questioon will make it easier for people here to understand. This will increase your chance of getting a useful answer. – Xukrao Feb 23 '19 at 00:58
  • I assume that what you mean by "did not work" doesn't mean a runtime error, but that the intended data modification didn't happen, right? – im_chc Feb 23 '19 at 01:39
  • Yes @im_chc I’m having a tough time reading the pandas doc for I guess conditional logic – 0004 Feb 23 '19 at 01:41
  • For "str" in the python code, is it just a string variable or is it a predefined constant / enum? If it's the former I couldn't see it declared in any part of the code – im_chc Feb 23 '19 at 01:45
  • The former. I put that in because values in the column are either a string or a int so I wanted to just look at the values that were not str when applying the logic @im_chc – 0004 Feb 23 '19 at 01:50
  • I'm not familiar with python let alone pandas, but I think the problem might have something to do with the selection criterion – im_chc Feb 23 '19 at 06:44
  • Like, do a print (data.loc[(data["Fmt"] != str) & (data["Format"] == "UN*"), "Format"]) to see if it comes up with something; if it hasn't (which I guess so) try to be less specific on the criteria like print (data.loc[(data["Fmt"] != str), "Format"]) (... or even print (data.loc[:, "Format"])... not sure but I think it should dump the whole column of "Format") – im_chc Feb 23 '19 at 06:58
  • Can you provide an xls with some sample data in a link? With expected result. – QHarr Feb 23 '19 at 12:10
  • @QHarr when I get back to my computer. – 0004 Feb 23 '19 at 19:54
  • Do you have Any updates? – im_chc Feb 25 '19 at 23:33
  • @im_chc sorry very busy at work right now and data is on machine but can’t take time to put onto here. – 0004 Feb 25 '19 at 23:41
  • @im_chc thank your for patience , please view my edited post to see addl info. Thanks again for help!! – 0004 Feb 27 '19 at 03:07
  • @QHarr -- see edits, new logic and example data. Let me know if you can be of any further help. Also, in the code you can export data into xlsx! :) – 0004 Feb 27 '19 at 03:07
  • Is the asterisk in ```data["Format"] == "UN*"``` meant as a wildcard? If it is, then you need to either use regex or the fnmatch module – kerwei Feb 27 '19 at 03:44
  • @kerwei yes it is. – 0004 Feb 27 '19 at 05:29

2 Answers2

0

Right now your conditional filter is looking for the literal 'UN*' against the 'Format' column. To use asterisk as a wildcard, you can use the fnmatch module.

import fnmatch

data.loc[(data["Fmt"] != str) & (data["Format"].apply(lambda x: fnmatch.fnmatch(x, 'UN*')), "Format"] = 'BD'
kerwei
  • 1,822
  • 1
  • 13
  • 22
0

Still having one issue, when using this on live data from excel, my Column_B imports in to a dataframe as an "object" it contains mostly strings but some numerical values, for example '350', and the logic does not work for said int value ... any reason why?

Got it to work with this code: data.loc[data.Fmt .astype(str) == '350', 'Fm'] = 'test' All, below is an answer that appears to be working, (the order of each line is important)

But is there a more pythonic way achieve this, i.e. using a wildcard ? The lended answers above for a wildcard solution did not work, so please view lengthy solution below:

import pandas as pd

startdf = pd.DataFrame({'Column_A':['DFHD', 'DFBG', 'DFVD', 'MFUB', 'MFBD', 'DFBD', 'UFNC', 'UFNC', 'BFYD',
                                    'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],

'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})
#writer = pd.ExcelWriter('testdf.xlsx', engine='xlsxwriter')
#df.to_excel(writer, sheet_name='Sheet1')
#writer.save()

df2 = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                            'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})


resultdf = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                                 'BFD', 'SFD', 'SFD', 'SFD', 'SFD', 'BFD','UHFD', 'UHFD', 'BFD', 'BFD', 'BFD', 'SFD','BFD', 'UHFD', 'SFD'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

test = startdf

test.loc[test.Column_A == 'DFHD', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'DFBG', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'DFVD', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'MFUB', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'MFBD', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'DFBD', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'UFNC', 'Column_A'] = 'CFD'
test.loc[test.Column_A == 'BFYD', 'Column_A'] = 'BFD'

test.loc[test.Column_B == '357', 'Column_A'] = 'BFD'
test.loc[test.Column_B == '350', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'B50', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'B25', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'BVG', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'BUG', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'DVG', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV9', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV5', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV8', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV0', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DBG', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'U66', 'Column_A'] = 'UHFD'
test.loc[test.Column_B == 'U1C', 'Column_A'] = 'UHFD'
test.loc[test.Column_B == 'UVG', 'Column_A'] = 'UHFD'

finaldf = test 
0004
  • 1,156
  • 1
  • 14
  • 49