To concatenate a big dataframe from for loop outputs Python

Question

I am trying to build a big dataset from the sliced result for each for loop output.

The code that i have made are as follows:

for n in range(4): 
    script_dir = os.path.dirname(directory)
    rel_path = files[n]
    abs_file_path = os.path.join(script_dir, rel_path)
    to_open = pd.read_csv(abs_file_path, header=0)
    to_open["Geographic Address"] = to_open["Geographic Address"].astype(str)
    to_open["Geographic Address"] = to_open["Geographic Address"].map(lambda x: x[3:-1])
    to_open = to_open[to_open["Geographic Address"] == ld_up[n]]
    to_open.index = range(len(to_open))
    ind = np.searchsorted(to_open["Time"], time[n])
    ind = np.asscalar(np.array(ind))
    UpperBound = ind - 30
    data = to_open.iloc[UpperBound:ind,:]

So as you can see, from the data column, if I slice the output, only the output from case 3 is shown, I would like to have a big file while includes case 0, 1 ,2 ,3.

Welcome to SO! It's always helpful to include some sample data as text. The easiest way to do this is paste the output of df.head() into a code block in your questions — Charles Landau, Dec 11 '18 at 16:27
Are you trying to combined the data slices in to a single column of data, or multiple columns corresponding to case 0, 1, .., n? — dan_g, Dec 11 '18 at 16:29
for the iloc, I have sliced the desired data into the range which I want, which is 30 entries. Instead of opening 4 files and only select the output of case 3 (i.e. range 4); I would like to build a large dataset using the same slicing and keep acculmulating it. — Jacky Man, Dec 11 '18 at 16:31
right, my question was with regards to how you want to accumulate that data (i.e. stacked on top of each other, or placed side-by-side). It sounds like the former — dan_g, Dec 11 '18 at 16:35
Sounds like duplicate of: https://stackoverflow.com/questions/32444138/concatenate-a-list-of-pandas-dataframes-together — dan_g, Dec 11 '18 at 16:38
hi user3014097, I use 4 files for trial. However, in my actual dataframe, I have got 421 files to look for. It would be very complex to handle 421 dataframes so I dont know if there is another method to achieve that. Thanks for your input — Jacky Man, Dec 11 '18 at 16:41

dan_g · Accepted Answer · 2018-12-11T16:33:27.797

0

It looks like you're trying to stack these different cases, in which case what you should do is append them to a list and then concatenate the list

df_list = []
for n in range(4): 
    script_dir = os.path.dirname(directory)
    rel_path = files[n]
    abs_file_path = os.path.join(script_dir, rel_path)
    to_open = pd.read_csv(abs_file_path, header=0)
    to_open["Geographic Address"] = to_open["Geographic Address"].astype(str)
    to_open["Geographic Address"] = to_open["Geographic Address"].map(lambda x: x[3:-1])
    to_open = to_open[to_open["Geographic Address"] == ld_up[n]]
    to_open.index = range(len(to_open))
    ind = np.searchsorted(to_open["Time"], time[n])
    ind = np.asscalar(np.array(ind))
    UpperBound = ind - 30
    data = to_open.iloc[UpperBound:ind,:]
    df_list.append(data)

df = pd.concat(df_list)

edited Dec 11 '18 at 16:33

answered Dec 11 '18 at 16:27

dan_g

2,712
5
25
44

Thanks for your comment, nearly there. the output from the col_list is 120*160. which is what I wanted to see from the output. However, apart from the first case in df, all of the other outputs are NaNs despite I can view them correctly in col_list. Would that be somethime matters to the index when I try to concatenate it? My indexes are pretty random and not starting from 1 – Jacky Man Dec 11 '18 at 16:39
After reading through your edit again, I have tried that on Python and It works perfectly fine now !!!! Thanks a lot for the input. So for future work, I shall create a hollow list for all the outputs that I have obtained from and then concatenate it? Massive thanks for your help! I am very grateful as I have been thinking about that for half a day. I am a novice in python – Jacky Man Dec 11 '18 at 16:45
No problem. And yes, that's more or less the strategy you'll want to use if you're repeating the same process a bunch of times where each repetition generates a small data frame (or series) and you want to then combine all of those subsets in to one large data frame. – dan_g Dec 11 '18 at 16:50
if you don't care about the index of the subsets you can specify `ignore_index` when you call `pd.concat`. See: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html – dan_g Dec 11 '18 at 16:52

To concatenate a big dataframe from for loop outputs Python

1 Answers1