0

I want to delete some rows from a CSV file by saving a new CSV after a validation process. I wrote the code below but it causes an error.

with open(path_to_read_csv_file, "r") as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for line in csv_reader:
        # if validation(line[specific_column]):
            try:
                df = pd.DataFrame(line)
                df.to_csv(path_to_save_csv_file)

            except Exception as e:
                print('Something Happend!')
                print(e)
                continue

Error:

Something Happend!
If using all scalar values, you must pass an index

I've also tried to add an index value by df = pd.DataFrame(line, index=[0]), but it only stores the first line with an additional empty column at the beginning. How can solve this?

Another version with line works but I can not reach a specific key value at each line:

inFile = open(path_to_read_csv_file, 'r')
outFile = open(path_to_save_csv_file, 'w')

for line in inFile:
    try:
        print('Analysing:', line)

        # HERE, how can I get the specific column value? I used to use line[specific_column] in the last version
        if validation(line[specific_column]):
            outFile.write(line)
        else:
            continue

    except Exception as e:
        print('Something Happend!')
        print(e)
        continue

outFile.close()
inFile.close()
ECub Devs
  • 165
  • 3
  • 10

2 Answers2

0

This should help you. Basically, you cannot create a DataFrame from scalar-values only. They have to be wrapped in eg. a list.

Marcus
  • 943
  • 5
  • 21
0

The constructor pd.DataFrame expects you to tell how the data that you have provided has to be indexed as well. This is documented here.

The function csv.DictReader uses

the values in the first row of file f will be used as the fieldnames.

For more information, refer to the csv documentation.

Hence, each line that is parsed by the csv_reader is a dictionary where the keys are the CSV header and the values are each the row in the particular line.

So for example, if my CSV is:

Header1, Header2, Header3
1,2,3
11,11,33

Then in the first iteration, the line object would be:

{'Header1': '1', 'Header2': '2', 'Header3': '3'}

Now when you supply this to pd.DataFrame, you need to specify what the data is and what the headers/indices are. In this case, the data is ['1', '2', '3'] and the headers/indices are ['Header1', 'Header2', 'Header3']. These can be extracted by the calls line.values() and line.keys() respectively.

This is the change I have made.

with open(path_to_read_csv_file, "r") as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for line in csv_reader:
        try:
            # validation ...
            df = pd.DataFrame(line.values(), line.keys())
            df.to_csv(path_to_save_csv_file)

        except Exception as e:
            print('Something Happend!')
            print(e)
            continue
  • Thank you for your explanation. But these codes only stores the last line. I've just to use line and it works but I can not reach a specific key value at each line. Please see the edited version. – ECub Devs Oct 17 '20 at 11:19
  • Can you clarify with an example what exactly you are trying to achieve? Like what would be an input to your program and what would you expect as output? – Sumit Chaturvedi Oct 17 '20 at 11:28
  • I have a large csv file of reviews: id, name, comment, ... 2302734, David, This is a very nice restaurant, ... ... And I want to filter some and store the new version. main.csv ---> edited.csv – ECub Devs Oct 17 '20 at 13:35
  • Can you update your question to something that is more indicative of what you are trying to achieve i.e. read a csv, filter the entries and then save it. Then I can edit my answer to let you know how to do that. By the way, [this](https://stackoverflow.com/questions/32968747/filtering-in-pandas-how-to-apply-a-custom-method-lambda) may help you. – Sumit Chaturvedi Oct 17 '20 at 14:19