How do you read two CSV files in python using pandas and append them to each other?

Question

I am working on an application that requires me to import CSV files and display each row as a banner. The current method I am using successfully displays the banners, but what it is doing each time is overwriting the array that I am using. What I would like to do is read a CSV file, append it to an array, then have the option to read another CSV file to the same array. This is so that I can then run another method that allows me to read that array and save both sets of data back to another CSV file.

Here is my method for retrieving the data

def GetData(root, self, data_dictionary, data_arr):
        temp = Tk()
        temp.wm_state('iconic')
    
    filetypes = (
        ('CSV', '*.csv'),
        ('All files', '*.*')
    )

    filename = fd.askopenfilename(
        title='Open a file',
        initialdir='./geodata/',
        filetypes=filetypes)
    
    if filename:
        with open(filename):                
            data_arr = pd.read_csv(filename)
            temp.destroy()
            return data_arr
    else:
        Alert(title='Error', text='Please select a file')
        temp.destroy()
        return

I then have another method that calls the method:

def upload_file(root, self):
    global data_arr
    values = GetData(self, root,  data_arr) #Where data_arr is defined in the global scope as []
    data_arr.append(values)
    print(data_arr)

The first time I upload a file, this is what it prints:

[(  publicid   eventtype                origintime          modificationtime   longitude  latitude  magnitude  ...  usedphasecount usedstationcount  magnitudestationcount minimumdistance azimuthalgap originerror magnitudeuncertainty
0  testid1  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     
1  testid2  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0

The problem with this method is that it prints a result like this when I try and upload a second file:

[(  publicid   eventtype                origintime          modificationtime   longitude  latitude  magnitude  ...  usedphasecount usedstationcount  magnitudestationcount minimumdistance azimuthalgap originerror magnitudeuncertainty
0  testid1  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     
1  testid2  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     

[2 rows x 21 columns],), (  publicid   eventtype                origintime          modificationtime   longitude  latitude  magnitude  ...  usedphasecount usedstationcount  magnitudestationcount minimumdistance azimuthalgap originerror magnitudeuncertainty
0  testid1  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     
1  testid2  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     

[2 rows x 21 columns],)]

Is there a way I can read the second CSV file and have it store it in the original array with the other data like this:

[(  publicid   eventtype                origintime          modificationtime   longitude  latitude  magnitude  ...  usedphasecount usedstationcount  magnitudestationcount minimumdistance azimuthalgap originerror magnitudeuncertainty
0  testid1  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     
1  testid2  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     
3  testid3  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     
4  testid4  earthquake  2021-08-24T04:49:39.852Z  2021-08-24T04:54:00.721Z  177.428024  -37.3592   2.342669  ...              16               12                      6        0.486189   258.299072    0.542223                    0     

[4 rows x 21 columns],)]

score 0 · Answer 1 · answered Sep 28 '21 at 04:11

The problem is that each time you call upload_file, you're actually appending an array (returned by GetData) to an array. So you end up with a new element in the original array and the type of that element is an array itself. Try using data_arr.extend(values) instead of data_arr.append(values).

Example:

foo = ["a", "b"]
bar = ["c", "d"]
print(foo.append(bar))  # ['a', 'b', ['c', 'd']]

baz = ["e", "f"]
qux = ["g", "h"]
print(baz.extend(qux))  # ['e', 'f', 'g', 'h']

How do you read two CSV files in python using pandas and append them to each other?

1 Answers1