Fixing Indexing when Appending Dataframes

Question

I am appending three CSVs:


df = pd.read_csv("places_1.csv")
temp = pd.read_csv("places_2.csv")
df = df.append(temp)
temp = pd.read_csv("places_3.csv")
df = df.append(temp)
print(df.head(20))

the joined table looks like:

  location  device_count  population
0        A            11         NaN
1        B            12         NaN
2        C            13         NaN
3        D            14         NaN
4        E            15         NaN
0        F            21         NaN
1        G            22         NaN
2        H            23         NaN
3        I            24         NaN
4        J            25         NaN
0        K            31         NaN
1        L            32         NaN
2        M            33         NaN
3        N            34         NaN
4        O            35         NaN

As you can see the indices are not unique.

When I call this iloc function to multiply the population column by 2:

df2 = df.copy
for index, row in df.iterrows():
    df.iloc[index, df.columns.get_loc('population')] = row['device_count'] * 2

I get the following erronious result:

  location  device_count  population
0        A            11        62.0
1        B            12        64.0
2        C            13        66.0
3        D            14        68.0
4        E            15        70.0
0        F            21         NaN
1        G            22         NaN
2        H            23         NaN
3        I            24         NaN
4        J            25         NaN
0        K            31         NaN
1        L            32         NaN
2        M            33         NaN
3        N            34         NaN
4        O            35         NaN

For each CSV it is overwriting the indexes of the first CSV I have also tried creating a new column of integers and calling df.set_index(). That did not work.

Any tips?

add ignore_index, ``df.append(temp, ignore_index=True)`` – sushanth May 31 '20 at 03:35 — sushanth, May 31 '20 at 03:35

score 2 · Accepted Answer · answered May 31 '20 at 03:41

2

First, use ignore_index, second, don't use append, use pd.concat([temp1, temp2, temp3], ignore_index=True).

answered May 31 '20 at 03:41

Igor Rivin

4,632
2
23
35

Why use concat instead of append? @Igor Riven – Steve Scott May 31 '20 at 16:46

score 1 · Answer 2 · answered May 31 '20 at 04:54

1

As others have stated, you can use ignore_index, and you probably should use pd.concat here. Alternatively, for other situations where you are not combining DataFrames, you can also use df = df.reset_index(drop=True) to change the indices after the fact.

Additionally, you should avoid using iterrows() for reasons listed in the docs here. Using the following works way better:

df.loc[:, 'population'] = df.loc[:, 'device_count'].astype('int') * 2

answered May 31 '20 at 04:54

bug_spray

1,445
1
9
23

the *2 is just a proxy for a more complex function I am running that involves multiple columns from the row. – Steve Scott May 31 '20 at 16:42
why use concat instead of append? – Steve Scott May 31 '20 at 16:47
@SteveScott It's more flexible, and in much wider use, so is likely to be more efficient. https://machinelearningknowledge.ai/tutorial-pandas-concat-pandas-append-pandas-merge-pandas-join/ – Igor Rivin May 31 '20 at 16:50
Strange. `df = df.reset_index(drop=True)` or `df.reset_index(drop=True, inplace=True)` should work... – bug_spray May 31 '20 at 16:55
If vectorization is not an option for your function, I recommend you refer to cs95's answer to this [question](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758?noredirect=1) for a performance analysis of modifying dataframes with functions. You'll notice `iterrows()` is the last tool you should reach for – bug_spray May 31 '20 at 16:59

Fixing Indexing when Appending Dataframes

2 Answers2