0

I have a function that takes a list like this:

list1 = [A1, A2, A3, A4, A5, A7, A8]

And finds the missing characters and reapplies it to the list. It then takes that list and inserts it into a pandas dataframe.

I have broken it up into 3 functions: remove_chars strips off the characters from the list (assuming that are n characters at the in front of the digit for each entry); missing_elements finds any numbers that are missing from the list and makes a new list of those numbers (in the case of list1 above, missing_elements would return [6], as that is the number missing from the list); finally, insert_into_df uses the output from missing_elements to stick the missing number(s) into the dataframe where they should be (the dataframe has a bunch of columns that are labeled like list1 and it may have missing columns). Here is what it looks like:

# Function to remove strings from questions
# Input list of strings and ints and outputs list with only ints
def remove_chars(L1):

    if len(L1) > 0:
        for i, j in enumerate(L1):
            L1[i] = re.sub('[^0-9]','', j)
            L1[i] = int(L1[i])
        return L1
    else:
        return

# Function to pick out missing numbers in lists
# This is used to ensure that each column list contains no deleted columns
def missing_elements(L1, start = None, end = None):

    if end is None and start is None:
        if len(L1) > 0:
            newlist1 = remove_chars(L1)
            start = 0
            end = len(newlist1) - 1
        else:
            return

    start, end = newlist1[0], newlist1[-1]
    return sorted(set(range(start, end + 1)).difference(newlist1))

# Function to insert missing sequential columns into dataframe
def insert_into_df(L, df):

    """
    insert_into_df: Inserts columns missing from dataframes into dataframe at the 
    proper index so that the inserted columns are in the correct order. This
    function is only to be used for dataframes containing sequential columns.
    ----
    Parameters:
        L: The list of column names that may contain a missing column
        df: The dataframe into which these columns will be inserted
    """

    tempList = list(L)

    if len(L) > 0:
        stringL0 = str(re.sub(r'\d+', '', tempList[0]))
        mList1 = missing_elements(L)

        if len(mList1) > 0:
            for i in range(len(mList1)):
                df.insert(loc = mList1[i], column = stringL0 + str(mList1[i]), value = 0)
        else:
            return df

        return df

    else: 
        return df

When I put in a print statement, it seems to output the proper dataframe, yet upon exporting it as a csv, it seems have applied the remove_chars function to every column header and just outputs a bunch of numbers in sequential order.

Can anyone tell my why this happening and what to do? If you need more clarification, let me know.

DrakeMurdoch
  • 765
  • 11
  • 26
  • 4
    Because `L1` and `L2` are *the same list*, `L`. Anything you do to any of those names changes the underlying list for all of them. See [how to copy lists in python](https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) – TemporalWolf Jun 14 '18 at 19:28
  • Possible duplicate of [How to clone or copy a list?](https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list) – TemporalWolf Jun 14 '18 at 21:52
  • I fixed the question to not be a duplicate and try and actually solve the problem I am having. – DrakeMurdoch Jun 15 '18 at 17:14
  • This question would benefit from having an example of input/output that you get versus what you expected. Also, in future cases, it's better to open a new question instead of editing the old one if you have a new question, if for no other reason than it's not going to show up on the new questions feed and so is much less likely to be answered. – TemporalWolf Jun 15 '18 at 17:55

0 Answers0