0

I am trying to merge multiple columns where after one column the following column starts in a specific index. for example, as you can see in the code below, I have 15 sets of data from df20 to df90. As seen in the code, I have merge the data i and then followed by another starting from index = 1,000.

So I wanted my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. I wanted to see all 15 columns but I only have one column in my output.

I have tried it below, but doesn't seem to work. Please help.

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]
for i in dframe:
  a = i.merge((i).set_index((i).index+1000), how='outer', left_index=True, right_index=True)

print(a)

Output:

                      df90_x              df90_y
0                     0.000757                      NaN
1                     0.001435                      NaN
2                     0.002011                      NaN
3                     0.002497                      NaN
4                     0.001723                      NaN
...                        ...                      ...
10995                      NaN             1.223000e-12
10996                      NaN             1.305000e-12
10997                      NaN             1.809000e-12
10998                      NaN             2.075000e-12
10999                      NaN             2.668000e-12

[11000 rows x 2 columns]

Expected Output:

                      df20                 df25                  df30
0                     0.000757             0                     0
1                     0.001435             0                     0
2                     0.002011             0                     0
3                     0.002497             0                     0
4                     0.001723             0                     0
...                  ...                   ...                   ...
1000                                      1.223000e-12           0
1001                                      1.305000e-12           0
1002                                      1.809000e-12           0
1003                                      2.668000e-12           0
...                                                              ...
2000                                                             0.1234
2001                                                             0.4567
2002                                                             0.8901
2003                                                             0.2345
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
Kim Yejun
  • 61
  • 6
  • 1
    That is doing what `merge` is expected to do. Try `pd.concat(dframe, axis=1)` – ThePyGuy Aug 12 '21 at 06:17
  • 1
    what would you like the output to be? Why is the output you got wrong? Explaining this would help answer a lot of questions I have about the format of your data. – Marijn van Vliet Aug 12 '21 at 06:22
  • @MarijnvanVliet So I wanted my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. – Kim Yejun Aug 12 '21 at 06:34
  • @ThePyGuy I wanted to try javing my output to be df20 followed by df25 starting at index=1000, then followed by df30 starting at index=2000, then followed by df35 at index=3000. – Kim Yejun Aug 12 '21 at 06:35
  • 1
    If it's only the index you are concerned about, you can use `pd.concat`. Post a small sample from the dataframes, and also add the expected output for the sample data. Please take a look at [How to ask](https://stackoverflow.com/help/how-to-ask) and [How to make good pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – ThePyGuy Aug 12 '21 at 06:37
  • @ThePyGuy I'm sorry for being vague. I have included my expected out in my question. Thank you for the tip. – Kim Yejun Aug 12 '21 at 06:54
  • Did you already store objects(dataframes) into `dframe` list actually? Cause you have thousands of dataframes, it couldn't be manually loaded from`.csv` files. During the process of loading data into dataframe, the dataframe list should be collected at the same time. – Baron Legendre Aug 13 '21 at 10:39

2 Answers2

1

please refer to official page.


Concat multiple dataframes

df1=pd.DataFrame(
        {
            "A":["A0","A1","A2","A3"]
        },
        index=[0, 1, 2, 3]
)
df2=pd.DataFrame(
        {
            "B":["B4","B5"]
        },
        index=[4, 5]
)
df3=pd.DataFrame(
        {
            "C":["C6", "C7", "C8", "C9", "C10"]
        },
        index=[6, 7, 8, 9, 10]
)
result = pd.concat([df1, df2, df3], axis=1)
display(result)

Output:

      A    B    C
0    A0  NaN  NaN
1    A1  NaN  NaN
2    A2  NaN  NaN
3    A3  NaN  NaN
4   NaN   B4  NaN
5   NaN   B5  NaN
6   NaN  NaN   C6
7   NaN  NaN   C7
8   NaN  NaN   C8
9   NaN  NaN   C9
10  NaN  NaN  C10

Import file into a list via looping

method 1: you can create a list to put whole filenames into a list

filenames = ['sample_20.csv', 'sample_25.csv', 'sample_30.csv', ...]
dataframes = [pd.read_csv(f) for f in filenames]

method 1-1: If you do have lots of files then you need a faster way to create the name list

filenames = ['sample_{}.csv'.format(i) for i in range(20, 90, 5)]
dataframes = [pd.read_csv(f) for f in filenames]

method 2:

from glob import glob
filenames = glob('sample*.csv')
dataframes = [pd.read_csv(f) for f in filenames]
Baron Legendre
  • 2,053
  • 3
  • 5
  • 22
  • Thank you very much for this answer. Actually I have tried it in this similar way however I have a lot of dataframes, around a thousand or more to concat so I was trying to figure out how to do it using a for loop function instead. – Kim Yejun Aug 12 '21 at 07:07
  • I updated the post about the list looping of dataframes – Baron Legendre Aug 12 '21 at 07:30
  • Perhaps you need to use this way to arrange the index respectively, make sure there won't be any overlapping index, then doing the "`concat` whole" process – Baron Legendre Aug 12 '21 at 07:37
  • Sorry, but can you enlighten me with the list looping? – Kim Yejun Aug 17 '21 at 08:06
  • Updated looping file into a list, after this step, you might need to deal with index in every single dataframe from the `dataframes` list – Baron Legendre Sep 02 '21 at 17:12
1

you can try this code, if you want variable for num_dataframe , length_dataframe:

import pandas as pd
import random

dframe = list()
num_dataframe = 3
len_dataframe = 5

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[random.randrange(1, 50, 1) for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

output:

enter image description here

and for your question, you want 20 data frame with 1000 length, you can try this:

import pandas as pd
import random

dframe = list()
num_dataframe = 20
len_dataframe = 1000

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[np.random.random() for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

output:

enter image description here

as you mentioned in the comment, I edit the post and add this code:

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]

result = pd.concat([dframe[i] for i in range(len(dframe))], axis=0)

result.fillna(0)
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • So how do I exactly input in my dataframes? I see those are only random numbers. can you enlighten me on this part please? – Kim Yejun Aug 12 '21 at 08:52
  • @KimYejun, I edit the post and add code as you request, maybe this helps you. – I'mahdi Aug 12 '21 at 09:28
  • I have tried your code but the data inside the dataframes were not shown. everything is just zero :( – Kim Yejun Aug 17 '21 at 08:04
  • @KimYejun, I send three code blocks, which code block did you run? – I'mahdi Aug 17 '21 at 08:25
  • Yes I did run everything. First code is the one I wanted my output to be. However I the values are random. So you gave me the 3rd code, which gives me the values of my dataframes from df20 to df90, but when I ran it, it only shows zeros not the actual values in my data frame :( I'm sorry, maybe I'm just really not good with coding – Kim Yejun Aug 18 '21 at 03:03
  • @KimYejun, check this: `pd.concat(dframe)` or check this link: https://stackoverflow.com/questions/32444138/concatenate-a-list-of-pandas-dataframes-together or you can run code here then I can check: https://onecompiler.com/python/3x8t7ybsc – I'mahdi Aug 18 '21 at 04:45