0

Currently I am working on grabbing data from a csv (string type) and splitting the string (int type) and putting each integer of the string in its own column. The Strings are all the same length, So far I have this code:

import pandas as pd
column_names = ['W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7', 'W8', 'W9', 'W10', 'W11', 'W12', 'W13', 'W14', 'W15', 'W16',
                'W17', 'W18', 'W19', 'W20']

    db = pd.read_csv(databasefile, skip_blank_lines=True,
                     names=['A', 'B', 'C', 'D'], header=0)
    db[column_names] = db['B'].str.split(',', expand=True)

This code does work to some extent, the values from the B column are split from the list and values are recorded to the columns in the dataframe, I am able to check this by printing out the column values such as print(db["W2"] where the values are printed.

My problem however is that the data is recorded to the dataframe but not the actual CSV. The columns ['W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7', 'W8', 'W9', 'W10', 'W11', 'W12', 'W13', 'W14', 'W15', 'W16', 'W17', 'W18', 'W19', 'W20'] are not in the CSV to fix this I tried to use

db = pd.concat([db, pd.DataFrame(columns=column_names)])

I also tried using

db[column_names] = db['Winning_Numbers'].str.split(',', expand=True).to_csv(databasefile, index=False)

This does work*, the problem is that it overwrites all information in the CSV

Anyhow thankyou for reading! I would appreciate any help with this problem

UPDATE: The Desired Function is to have this CSV enter image description here

Where the B column is a string, is to split the string in column B and take each number in the string and put it into its own column, This is done by the following code

    db[column_names] = db['B'].str.split(',', expand=True)

this works and I am able to read the data in each column, [W1-W20] However the CSV currently only has 4 Columns, I am trying to append the information in the dataframe to the CSV but it only overwrites the current data in there with

to_csv

I tried using the appending mode for the to_csv but that never appended the data in the dataframe to the csv, so hopefully that is more clarification to the problem, on how to append data from the dataframe into the CSV (adding more columns to the CSV and appending data to those columns)

Timberghost_
  • 155
  • 4
  • 16
  • `db` is a pandas dataframe, once you read it from the csv (i.e.created it and filled with the data from the csv file) it has no connection whatsoever to the csv file. If you want to manifest the changes of the dataframe in the csv you'll have to write the dataframe back to the csv with `to_csv` (as you already tried). This will overwrite the file the way you used it. For appending see https://stackoverflow.com/questions/17530542/how-to-add-pandas-data-to-an-existing-csv-file. – Stef Oct 02 '19 at 09:18
  • I looked at the solutions posted, and was still not receiving any results that was appending the new columns and information in the CSV. I tried setting the `to_csv `functions mode to append instead of write as well as the other methods shown. The error given is: `pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 240, saw 21` – Timberghost_ Oct 02 '19 at 17:46
  • I still don't fully understand what you want to do: read the data from `databasefile` and then write (part of) the data back to the same file? Do you just want to convert the file format in the csv? Can you share your databasefile (or the first couple of lines from it as well as line 240) along with the desired result? – Stef Oct 02 '19 at 18:48
  • Sure, I will post it in the overall thread – Timberghost_ Oct 02 '19 at 19:46
  • could you write the result to a different csv? – Stef Oct 02 '19 at 20:49
  • For the moment I am writing the information to a new CSV, I would appreciate if somebody could tell me how to append the information in the dataframe to the CSV, thankyou – Timberghost_ Oct 03 '19 at 01:08

1 Answers1

1

I'm afraid I still don't fully understand what is the desired output but maybe this helps you to get started. It appends the original data with the new columns to the existing data in the csv.

import pandas as pd

column_names = ['W1', 'W2', 'W3', 'W4', 'W5', 'W6', 'W7', 'W8', 'W9', 'W10', 'W11', 'W12', 'W13', 'W14', 'W15', 'W16',
                'W17', 'W18', 'W19', 'W20']

db = pd.read_csv(databasefile, skip_blank_lines=True, names=['A', 'B', 'C', 'D'], header=None)
db[column_names] = db['B'].str.split(',', expand=True)

with open(databasefile, 'a') as f:
    db.to_csv(f, header=False, index=False)

If you want to replace the string column 'B' with the expanded values you can use:

db[['A']+column_names+['C','D']].to_csv(f, header=False, index=False)
Stef
  • 28,728
  • 2
  • 24
  • 52