I have a function that takes a list like this:
list1 = [A1, A2, A3, A4, A5, A7, A8]
And finds the missing characters and reapplies it to the list. It then takes that list and inserts it into a pandas dataframe.
I have broken it up into 3 functions: remove_chars
strips off the characters from the list (assuming that are n characters at the in front of the digit for each entry); missing_elements
finds any numbers that are missing from the list and makes a new list of those numbers (in the case of list1
above, missing_elements
would return [6]
, as that is the number missing from the list); finally, insert_into_df
uses the output from missing_elements
to stick the missing number(s) into the dataframe where they should be (the dataframe has a bunch of columns that are labeled like list1
and it may have missing columns). Here is what it looks like:
# Function to remove strings from questions
# Input list of strings and ints and outputs list with only ints
def remove_chars(L1):
if len(L1) > 0:
for i, j in enumerate(L1):
L1[i] = re.sub('[^0-9]','', j)
L1[i] = int(L1[i])
return L1
else:
return
# Function to pick out missing numbers in lists
# This is used to ensure that each column list contains no deleted columns
def missing_elements(L1, start = None, end = None):
if end is None and start is None:
if len(L1) > 0:
newlist1 = remove_chars(L1)
start = 0
end = len(newlist1) - 1
else:
return
start, end = newlist1[0], newlist1[-1]
return sorted(set(range(start, end + 1)).difference(newlist1))
# Function to insert missing sequential columns into dataframe
def insert_into_df(L, df):
"""
insert_into_df: Inserts columns missing from dataframes into dataframe at the
proper index so that the inserted columns are in the correct order. This
function is only to be used for dataframes containing sequential columns.
----
Parameters:
L: The list of column names that may contain a missing column
df: The dataframe into which these columns will be inserted
"""
tempList = list(L)
if len(L) > 0:
stringL0 = str(re.sub(r'\d+', '', tempList[0]))
mList1 = missing_elements(L)
if len(mList1) > 0:
for i in range(len(mList1)):
df.insert(loc = mList1[i], column = stringL0 + str(mList1[i]), value = 0)
else:
return df
return df
else:
return df
When I put in a print statement, it seems to output the proper dataframe, yet upon exporting it as a csv, it seems have applied the remove_chars
function to every column header and just outputs a bunch of numbers in sequential order.
Can anyone tell my why this happening and what to do? If you need more clarification, let me know.