Pandas Dataframe always creating new index column

Question

I am trying to append a dataframe csv_objects to another dataframe result, to get one combined dataframe. it has the exact same columns, its created in the same way, like this: Both dataframes where created like the code below, one gets saved to csv , then later read again and thats when i try to combine it with the newly created one.

  result = pd.DataFrame(data=np.reshape(self.get_data_from_object(object_id), (1, 14)),
                                        columns=("corners", "parts", "sharp", "steep",
                                                 "flat","flat_count", "over_air", "object_overhang", "bridges", "thin",
                                                 "total_area", "length_width", "length_height", "smallest_area"))

the data comes from here (all are float values):(this is the return signature of get_data_from_object)

        return corners, parts, sharp, steep, flat, flat_count, over_air, object_overhang, bridges, thin, total_area, length_width, length_height, smallest_area

ive tried combining them like this:

        csv_objects.loc[csv_objects.index.size]=result.loc[0]

or like this:


        csv_objects.append(result)

code to reproduce the issue:



data = [1,1,1,1,1,1,1,1,1,1,1,1,1,1]

return_array = pd.DataFrame(data=np.reshape(data, (1, 14)),
                                        columns=("corners", "parts", "sharp", "steep",
                                                 "flat","flat_count", "over_air", "object_overhang", "bridges", "thin",
                                                 "total_area", "length_width", "length_height", "smallest_area"))

 return_array.to_csv(path + "/save.csv")




 csv_objects = pd.read_csv(path + "/save.csv")

  result = pd.DataFrame(data=np.reshape(data, (1, 14)),
                                        columns=("corners", "parts", "sharp", "steep",
                                                 "flat","flat_count", "over_air", "object_overhang", "bridges", "thin",
                                                 "total_area", "length_width", "length_height", "smallest_area"))


 csv_objects.loc[csv_objects.index.size]=result.loc[0]

print(csv_objects)

but it always creates a new indexing column, so the resulting dataframe has 16 columns even though the old frames have 15 each (14 values and 1 index), which is not what i want. how can i prevent that and make them use the same index value? meaning, i need the new frames first row to start at the old frames last index value.

When i print the singular frames, it looks like this: [1 rows x 15 columns] Unnamed: 0 corners parts ... length_width length_height smallest_area 0 0 0.0 0.0 ... 1.0 0.5 1.0

when i print the combined frame, like this: [1 rows x 16 columns] Unnamed: 0 Unnamed: 0.1 ... length_height smallest_area 1 1 NaN ... 0.5 1.0

A few questions: 1) Is all your data contained in CSV files, with same columns, but different rows? 2) what does get_data_from_object return? — benvdh, Apr 29 '21 at 22:15
Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. — Prune, Apr 29 '21 at 22:16
edited the question to clarify, and provided a code example that might help. — Test12, Apr 29 '21 at 22:41
@Test12 I updated my answer with a suggestion to make the index continue... I'm still puzzled by the questions: Why are you using np.reshape, to provide data? Why do you name the output of pd.read_csv a csv_object, as in fact it produces a dataframe already? — benvdh, Apr 29 '21 at 23:01
the data i am using needs to be reshaped as its just a list of 14 values, thats why. Its called csv_object because thats the part of the dataframe thats loaded from csv, the other one is created in place. — Test12, Apr 29 '21 at 23:07

benvdh · Answer 1 · 2021-04-29T22:52:24.983

0

Not sure if you are just trying to combine a bunch of CSV files with the same column names and order, but if that's the case, probably the following should suffice:

df1 = pd.read_csv("my_file.csv")
df2 = pd.read_csv("my_other_file.csv")

combined_df = pd.concat([df1, df2])

To ensure the combined_df uses a singular index (rather than those of df1 and df2 individually), use reset_index:

combined_df.reset_index(drop=True, inplace=True)

edited Apr 29 '21 at 22:52

answered Apr 29 '21 at 22:18

benvdh

454
5
13

reset_index still produces the same result, i only load one of the dataframes from csv. – Test12 Apr 29 '21 at 22:59

Pandas Dataframe always creating new index column

1 Answers1