1

I am using Python 3.5. I have several csv files:

The csv files are named according to a fixed structure. They have a fixed prefix (always the same) plus a varying filename part:

099_2019_01_01_filename1.csv
099_2019_01_01_filename2.csv

My original csv files look like this:

filename1-Streetname filename1-ZIPCODE
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Street1 2012932
Street2 3023923

filename2-Name filename2-Phone
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Name1 2012932
Name2 3023923

I am manipulating these files using the following code (I am reading the csv files from a source folder and writing them to a destination folder. I am skipping certain rows as I do not want to include this information):

I cut off the TEXT rows, as I do not need them:

import csv
    
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
    with open(os.path.join(sourcefolder,file)) as fp_in:
        reader = csv.reader(fp_in, delimiter=';')
        rows = [row for i, row in enumerate(reader) if i not in skiprows]
        with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
            writer = csv.writer(fp_out)
            writer.writerows(rows)

(this code works) gives

filename1-Streetname filename1-ZIPCODE
Street1 2012932
Street2 3023923

filename2-Name filename2-Phone
Name1 2012932
Name2 3023923

The first row contains the header. In the header names there is always the filename (however without the 099_2019_01_01_ prefix) plus a "-". The filename ending .csv is missing. I want to remove this "filename-" for each csv file.

The core part now is to get the first row and only for this row to perform a replace. I need to cut off the prefix and the .csv and then perform a general replace. The first replace could be something like this:

  1. Either I could start with a function to cut off the first n signs, as the length is fixed or
  2. According to this solution just use string.removeprefix('099_2019_01_01_')

As I have Python 3.5 I cannot use removeprefix so I try to just simple replace it.

string.replace("099_2019_01_01_","")

Then I need to remove the .csv which is easy:

string.replace(".csv","")

I put this together and I get (string.replace("099_2019_01_01_","")).replace(".csv",""). (Plus at the end the "-" needs to be removed too, see in the code below). I am not sure if this works.

My main problem is now for this csv import code that I do not know how I can manipulate only the first row when reading/writing the csv. So I want to replace this only in the first row. I tried something like this:

import csv
    
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
    with open(os.path.join(sourcefolder,file)) as fp_in:
        reader = csv.reader(fp_in, delimiter=';')
        rows = [row for i, row in enumerate(reader) if i not in skiprows]
        with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
            writer = csv.writer(fp_out)
            rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")
            writer.writerows(rows)

This gives an error as the idea with rows[0] is not working. How can I do this?

(I am not sure if I should try to include this replacing in the code or to put it into a second code which runs after the first code. However, then I would read and write csv files again I assume. So I think it would be most efficient to implement it into this code. Otherwise I need to open and change and save every file again. However, if it is not possible to include it into this code I would be also fine with a code which runs stand-alone and just does the replacing assuming the csv file have the rows 0 as header and then the data comes.)

Please note that I do want to go this way with csv and not use pandas.

EDIT: At the end the csv files should look like this:

Streetname ZIPCode
Street1 9999
Street2 9848

Name Phone
Name1 23421
Name2 23232
BertHobe
  • 217
  • 1
  • 14

1 Answers1

1

Try by replacing this:

rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")

By this in your code:

x=file.replace('099_2019_01_01_','').replace('.csv', '')
rows[0]=[i.replace(x+'-', '') for i in rows[0]]
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30
  • Thanks for your answer, however I tried your code and it does only the replacement in the first column? So it does not replace this for all columns in the row. – BertHobe Dec 01 '20 at 14:04
  • Can you paste the 'rows' data here please (for the example in the question)? – IoaTzimas Dec 01 '20 at 14:07
  • Ok, what does rows data mean? What exactly should I post? So when I run your code and check the resulting file filename- was removed from the first column, but the second, for example ZIPcode still looks like filename-ZIPcode. So that is what I mean. – BertHobe Dec 01 '20 at 14:08
  • 1
    I mean the rows that is produced from this line: `rows = [row for i, row in enumerate(reader) if i not in skiprows]` It should be a small list, just paste it here, it would help a lot – IoaTzimas Dec 01 '20 at 14:09
  • Ok, I updated my question. Did you mean this? Oh I see you are talking about something else. How can I get what is in rows? – BertHobe Dec 01 '20 at 14:12
  • 1
    Thanks, I updated my answer, please check again – IoaTzimas Dec 01 '20 at 14:14
  • Thanks for your update, but I think there is a logical error in your answer. Please check again, what I want to remove is dynamically. So the csv contains the dynamical part of the filename. So I need to remove the filename. However, before I need to replace the prefix and the csv. But from the column names in the csv I do not need to replace the prefix and csv. Do you know what I mean? So in my code there is a file. from which I remove something before I use this subpart to remove it from my column names. – BertHobe Dec 01 '20 at 14:20
  • So I run your code, however in the resulting csv files there is still filename-Streetname and filename-ZIPCode in the column names. It was not removed. – BertHobe Dec 01 '20 at 14:20
  • 1
    My suggestion will run dynamically too, have you replaced the code based on my answer? rows will be updated-replaced just before they are saved to the destination csv, for every filename You would better paste a sample of 'rows' that is produced from your original code, so that i will check the stucture of it. Depending on this structure, i will adjust the solution – IoaTzimas Dec 01 '20 at 14:24
  • When I run your code nothing is replaced. So in my resulting csv files there is still filename-Streetname and filename-ZIPcode? I suggest, this is because with your code I remove the prefix and the .csv ending, however these are not in the column names. So I am removing something which is not in it. So nothing gets replaced. I need to remove the "filename-". – BertHobe Dec 01 '20 at 14:27
  • 1
    You want your columns to be just Streetname and ZIPCode? – IoaTzimas Dec 01 '20 at 14:32
  • Yes exactly, not filename-Streetname and filename-ZIPCode. – BertHobe Dec 01 '20 at 14:35
  • In this case, you can directly assign rows[0] with: `rows[0]=['Streetname ; ZIPcode']` or `rows[0]=['Streetname ZIPcode']` or `rows[0]=['Streetname', 'ZIPcode']` depending on the structure of your rows (delimiter, etc). Let me know if (and which one) works, so that i will update answer – IoaTzimas Dec 01 '20 at 14:37
  • I cannot hard-code these as the column names differ from file to file and there are not just two? – BertHobe Dec 01 '20 at 14:40
  • 1
    I see. Please check the new version of my answer. If it doesn't work, i would suggest to just split the words with '-' and keep the last part. – IoaTzimas Dec 01 '20 at 14:52
  • Thanks for your update, I am getting an error 'list' object has no attribute 'replace'. Furthermore please note that I do not want other "-" to be removed from the names. So when it is filename2-Streetname I do want to get it removed. However, when it is "filename2-Street-name" then the second "-" between Street and name should not be removed. – BertHobe Dec 01 '20 at 14:58
  • Exactly what I wanted and the way I wanted it. – BertHobe Dec 01 '20 at 15:10