3

I am running into an issue where each time I run my program (which reads the dataframe from a .csv file) a new column shows up called 'Unnamed'.

sample output columns after running 3 times -

  Unnamed: 0  Unnamed: 0.1            Subreddit  Appearances

here is my code. for each row, the 'Unnamed' columns simply increase by 1.

df = pd.read_csv(Location)
while counter < 50:
    #gets just the subreddit name
    e = str(elem[counter].get_attribute("href"))
    e = e.replace("https://www.reddit.com/r/", "")
    e = e[:-1]
    if e in df['Subreddit'].values:
        #adds 1 to Appearances if the subreddit is already in the DF
        df.loc[df['Subreddit'] == e, 'Appearances'] += 1
    else:
        #adds new row with the subreddit name and sets the amount of appearances to 1.
        df = df.append({'Subreddit': e, 'Appearances': 1}, ignore_index=True)
    df.reset_index(inplace=True, drop=True)
    print(e)
    counter = counter + 2
#(doesn't work) df.drop(df.columns[df.columns.str.contains('Unnamed', case=False)], axis=1)

The first time i run it, with a clean .csv file, it works perfect, but each time after, another 'Unnamed' column shoes up. I just wanted the 'Subreddit' and 'Appearances' columns to show each time.

miraculixx
  • 10,034
  • 2
  • 41
  • 60
Jack A
  • 109
  • 1
  • 4
  • 10
  • Have you tried: [Pandas: how to get rid of `Unnamed:` column in a dataframe](https://stackoverflow.com/questions/36519086/pandas-how-to-get-rid-of-unnamed-column-in-a-dataframe) ? – jpp Oct 10 '18 at 00:03
  • please keep the code formatted as I have done in https://stackoverflow.com/posts/52730814/revisions -- the code is easier to read and grasp without whitespace (blank lines serve no purpose other than to add clutter, if you want to separate blocks of code, add a comment line in between blocks, without whitespace lines) – miraculixx Oct 10 '18 at 00:06
  • @jpp I have tried to, but I honestly don't really know how to implement it. – Jack A Oct 10 '18 at 00:11
  • @JackA, For starters, do you have a `to_csv` anywhere in your code? – jpp Oct 10 '18 at 00:12

2 Answers2

7

An other solution would be to read your csv with the attribute index_col=0 to not take into account the index column : df = pd.read_csv(Location, index_col=0).

F Blanchet
  • 1,430
  • 3
  • 21
  • 32
5

each time I run my program (...) a new column shows up called 'Unnamed'.

I suppose that's due to reset_index or maybe you have a to_csv somewhere in your code as @jpp suggested. To fix the to_csv be sure to use index=False:

df.to_csv(path, index=False)

just wanted the 'Subreddit' and 'Appearances' columns

In general, here's how I would approach your task.

What this does is to count all appearances first (keyed by e), and from these counts create a new dataframe to merge with the one you already have (how='outer' adds rows that don't exist yet). This avoids resetting the index for each element which should avoid the problem and is also more performant.

Here's the code with these thoughts included:

base_df = pd.read_csv(location)
appearances = Counter()  # from collections
while counter < 50:
    #gets just the subreddit name
    e = str(elem[counter].get_attribute("href"))
    e = e.replace("https://www.reddit.com/r/", "")
    e = e[:-1]
    appearances[e] += 1
    counter = counter + 2
appearances_df = pd.DataFrame({'e': e, 'appearances': c } 
                               for e, c in x.items())
df = base_df.merge(appearances_df, how='outer', on='e')
    
miraculixx
  • 10,034
  • 2
  • 41
  • 60
  • thank you for this. I can't beilive i did not realize that `index=False` went in the `to_csv`. – Jack A Oct 10 '18 at 00:49