hope you are having a good day.
I have an issue with my code, I have to merge different datasets that I in a list called all_csv
. The thing is that the datasets are something like this:
Dataset 1
index device1
GR03 1110
GR20 1121
*GR08* 1109
Dataset 2
index device2
*GR08* 1112
GR01 1114
GR04 1123
(In the code the index is a column called list1)
As you can see, there are sometimes that the index may be the same (GR08). So I made this:
all_files = glob.glob(path + "/*data.csv") # This selects all the .csv files in the path
all_csv = [pd.read_csv(f, sep=',') for f in all_files]
all_csv = [df.set_index(df["list1"]) for df in all_csv] #list1 is the index shown above
all_csv = [df.drop(df.columns[0], axis=1) for df in all_csv] #delete the duplication
df_merged = pd.concat(all_csv, axis=1) # Here is the error "Reindexing only valid with uniquely valued Index objects"
I understand that to use concat there must be all the same index values, so I tried before to use it without setting the index but I got something like this:
all_files = glob.glob(path + "/*data.csv")
all_csv = [pd.read_csv(f, sep=',') for f in all_files]
df_merged = pd.concat(all_csv, axis=1)
Resulting dataset
index device1 device2 device3
nan 1110 nan 1092
nan 1121 nan nan
*GR08* 1109 1112 1098
nan nan 1114 nan
nan nan 1123 1111
This is correct, but I don't know why is the index values that do not connect in all the datasets appear with a nan.
Has anyone an idea how can I solve this? or an different strategy to solve it.
Thanks for all your answers.