6

I want to save a pandas dataframe as a csv file, the problem is that to_csv is converting the np.array into a string.

I want to save the array as an array, I could not find anything in the documentation that was useful.

sudoku_solution = [a for a in assignment if a > 0]


label = np.reshape(np.array(sudoku_solution*n_splits), 
                   (n_splits, len(sudoku_solution)))

df = pd.DataFrame(zip(label))

path = './data/SplitsLabel.csv'
try:
    df.to_csv(path_or_buf = path, 
              mode = 'a',
              header = False)

solution_sudoku = [123, 345, 894, 324, 321, 321] (list of integers)

n_splits = 3 (integer)

The final results should be something like:

0,[123 345 894 324 321 321]

1,[123 345 894 324 321 321]

3,[123 345 894 324 321 321]

But the result now is:

0,"[123 345 894 324 321 321]"

1,"[123 345 894 324 321 321]"

3,"[123 345 894 324 321 321]"

How do I get rid of those quotes?

Victor Zuanazzi
  • 1,838
  • 1
  • 13
  • 29
  • 1
    You can't save python objects in a `.csv`; it's just a text file and has no way of knowing what a `list` or `numpy.array` is. If you need to serialize python objects look into the [`pickle`](https://docs.python.org/2/library/pickle.html) format (pandas has a `DataFrame.to_pickle()` method). Even if you save it without the quotes in a `.csv` when you read it back you wont get a numpy array – ALollz Feb 18 '19 at 22:02
  • Thanks, I've lost enough hair on this today! The problem is that I need a format that allows me to append data on the go. For as far as I saw, pickle does not allow for extending the document. Or am I wrong? – Victor Zuanazzi Feb 18 '19 at 22:44

2 Answers2

0

I suspect that since your output includes commas that it may be entering quotes to avoid a conflict with the formatting. You could try changing your delimiter to a tab so this conflict doesnt happen. You can also change the "quoting" if the delimiter doesn't work for you.

Check out this link for more info: Pandas: use to_csv() with quotation marks and a comma as a seperator

Allen P.
  • 23
  • 1
  • 3
  • Thanks Allen, but I put the comas there by mistake. I corrected it the description. If I use the suggestion of your link the csv is not even saved =/ – Victor Zuanazzi Feb 18 '19 at 19:33
0

If you have this same problem, perhaps it will save you some headache by checking in here.

None of the solutions posted there could solve my problem, so here is the code to parse the string and convert it to the format I need:

   df = pd.read_csv(filepath_or_buffer = path_x,
                       header = None, 
                       names = ["i", "clauses"]) 

    #it is sad that I have to do that!
    df["clauses"] = df["clauses"].apply(lambda x: x.replace("[", ""))
    df["clauses"] = df["clauses"].apply(lambda x: x.replace("]", ""))
    df["clauses"] = df["clauses"].apply(lambda x: x.replace("\n", ""))
    df["clauses"] = df["clauses"].apply(lambda x: x.replace(",", ""))
    df["clauses"] = df["clauses"].apply(lambda x: x.split(" "))
    df["clauses"] = df["clauses"].apply(lambda x: np.array([int(i) for i in x]))

    cols = [x for x in range(120060)]
    df_x = pd.DataFrame(columns = cols)

    for i in range(len(df)):   
        df_x = df_x.append(pd.Series(data = {k: df["clauses"][i][k] for k in cols}),
                           ignore_index = True)

    df = pd.read_csv(filepath_or_buffer = path_y,
                       header = None, 
                       names = ["i", "label"]) 

    df_x.astype("int")
Victor Zuanazzi
  • 1,838
  • 1
  • 13
  • 29