0

I'm looking to insert information into a existing dataframe, this dataframe shape is 2001 rows × 13 columns, however, only the first column has information.

I have 12 more columns, but these are not the same dimension as the main dataframe, so I'd like to insert this additional columns into the main one using a conditional. Example dataframe:

enter image description here

This in an example, I want to insert the var column into the 2001 × 13 dataframe, using the date as a conditional and in case there is no date, it skips the row or simply adds a 0.
I'm really new to python and programming in general.

Zephyr
  • 11,891
  • 53
  • 45
  • 80
  • Is the first column the date? – Hadus Jun 16 '20 at 18:52
  • Can you not just remove the rows with empty date? – Hadus Jun 16 '20 at 18:54
  • Does this answer your question? [Python Pandas update a dataframe value from another dataframe](https://stackoverflow.com/questions/49928463/python-pandas-update-a-dataframe-value-from-another-dataframe) – Chris Jun 16 '20 at 18:55

1 Answers1

0

Without a minimal working example it is hard to provide you with clear recommendations, but I think what you are looking for is the .loc a pd.DataFrame. What I would recommend you doing is the following:

  • Selection of rows with .loc works better in your case if the dates are first converted to date-time, so a first step is to make this conversion as:
# Pandas is quite smart about guessing date format. If this fails, please check the
# documentation https://docs.python.org/3/library/datetime.html to learn more about
# format strings.
df['date'] = pd.to_datetime(df['date'])

# Make this the index of your data frame.
df.set_index('date', inplace=True)
  • It is not clear how you intend to use conditionals/what is the content of your other columns. Using .loc this is pretty straightforward
# At Feb 1, 2020, add a value to columns 'var'.
df.loc['2020-02-01', 'var'] = 0.727868
  • This could also be used for ranges:
# Assuming you have a second `df2` which as a datetime columns 'date' with the
# data you wish to add to `df`. This will only work if all df2['date'] are found
# in df.index. You can workout the logic for your case.
df.loc[df2['date'], 'var2'] = df2['vals']

If the logic is to complex and the dataframe is not too large, iterating with .iterrows could be easier, specially if you are beginning with Python.

for idx, row in df.iterrows():
    if idx in list_of_other_dates:
        df.loc[i, 'var'] = (some code here)

Please clarify a bit your problem and you will get better answers. Do not forget to check the documentation.