1

I have a pandas dataframe, loaded from a csv, structured as well: enter image description here

Who created the csv made same mistakes, and I need to move the first date which appears in each raw, to the column "Opening Date". The final result should be:

enter image description here

How can I do it witout specifing fom which column extract the date? (the only information I have is that it is the first one after the "Opening date" column).

Babbara
  • 448
  • 1
  • 6
  • 21
  • 1
    please provide your data in a reproducible format (ideally as dictionary or DataFrame constructor) – mozway Nov 21 '22 at 17:31

1 Answers1

1

I thought a very explanatory approach.

First, we need a function that recognizes the date type. I didn't understand if there is a specific format in your csv, so when in doubt we will use a function that recognizes any pattern.

Check out 'Check if string has date, any format':

from dateutil.parser import parse

def is_date(string, fuzzy=False):
    try: 
        parse(string, fuzzy=fuzzy)
        return True

    except ValueError:
        return False

At this point, we can iterate for each row in your dataframe and where there is no value in the right column, we search on all the next ones.

sub_df = df.iloc[:, df.columns.str.find("Opening Data").argmax()+1:]  # retrieve only remaining columns

for index, row in df.iterrows():
    if not row['Opening Data']:
        for col in sub_df.columns:
            if is_date(row[col]):
                df.iloc[index]['Opening Data'] = row[col]
                df.iloc[index][col] = ''

Starting from a dataset of this form:

Opening Data col_0 col_1
0 01-01-2000 00:00:00
1 02-01-2000 00:00:00
2 03-01-2000 00:00:00

the output will be:

Opening Data col_0 col_1
0 01-01-2000 00:00:00
1 02-01-2000 00:00:00
2 03-01-2000 00:00:00
Giuseppe La Gualano
  • 1,491
  • 1
  • 4
  • 24