0

Trying to figure out the code to remove the rows in csv file where in column Date there is date starting with 202110 (and any day). So all rows from October should be removed. Then I want to save csv with orginal name + 'updated'. I think that both part where I am trying to remove row is incorrect and save the file. Could you help?

My current code is

import os
import glob
import pandas as pd
from pathlib import Path

sourcefiles = source_files = sorted(Path(r'/Users/path/path/path').glob('*.csv'))


for file in sourcefiles:
 df = pd.read_csv(file)
 df2 = df[~df.Date.str.contains('202110')] 
 df2.to_csv("Updated.csv") # How to save with orginal file name + word "updated"

Just to give the example of csv file. As you can see in yellow highlighted cells there are dates in October, these rows I need to remove and save csv with 'updated' in name. Thanks a lot for help.

enter image description here

  • 1
    Last line should look like: df2.to_csv(filename.replace(".csv", "_updated.csv")). Can you provide at least some rows from your CSV here, to verify with your code? – Karol Adamiak Dec 12 '21 at 15:23
  • Does this answer your question? [Search for "does-not-contain" on a DataFrame in pandas](https://stackoverflow.com/questions/17097643/search-for-does-not-contain-on-a-dataframe-in-pandas) tl;dr - your row removal code is fine – Shahan M Dec 12 '21 at 15:30

2 Answers2

0

You can do something like this:

for file in sourcefiles:
    df = pd.read_csv(file)
    df.Date = pd.to_datetime(df.Date)
    condition = ~((df.Date.dt.year == 2021) & (df.Date.dt.month == 10))
    df_new = df.loc[condition]

    name, ext = file.name.split('.')
    df.to_csv(f'{name}_updated.{ext}') 

This is assuming you have one dot in your filenames.

Boris Kalinin
  • 124
  • 2
  • 8
  • Hi, Thanks for answer. After running this code I have this in Date column: 01/01/1970 00:00:00 for each row. What I need is simply to remove the rows in which date is 2021 October (might be any day), e.g. 20211012. and remaining rows to be left untouched. – user14566555 Dec 21 '21 at 08:23
0

As you use pathlib, you can use file.parent and file.stem:

Replace:

df2.to_csv("Updated.csv")

By:

df2.to_csv(file.parent / f"{file.stem}_updated.csv"))
Corralien
  • 109,409
  • 8
  • 28
  • 52