0

I'm having a bit of a problem trying to get my code to compile. Looks like the line with main_df = df is causing a failure, and I don't quite understand why.

Any help is much appreciated.

import quandl
import pandas as pd

# API key was removed
api_key = 'X'
fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states',flavor='html5lib')

main_df = pd.DataFrame()

for abbv in fiddy_states[0][0][1:]:
    query = "FMAC/HPI_"+str(abbv)
    df = quandl.get(query, authtoken=api_key)

    if main_df.empty:
        main_df = df
    else:
        main_df = main_df.join(df)

print(main_df.head())

I get this error:

Traceback (most recent call last): File "C:/Users/Dave/Documents/Python Files/helloworld.py", line 17, in main_df = main_df.join(df)

File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 4385, in join rsuffix=rsuffix, sort=sort)

File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 4399, in _join_compat suffixes=(lsuffix, rsuffix), sort=sort)

File "C:\Python35\lib\site-packages\pandas\tools\merge.py", line 39, in merge return op.get_result()

File "C:\Python35\lib\site-packages\pandas\tools\merge.py", line 223, in get_result rdata.items, rsuf)

File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 4445, in items_overlap_with_suffix to_rename) ValueError: columns overlap but no suffix specified: Index(['Value'], dtype='object')

wowdavers
  • 111
  • 1
  • 2
  • 9
  • What are you trying to do? Append the new data to the dataframe in each iteration? pd.join does sql-style joins, probably not what you are looking for here. Try `main_df = main_df.append(df)`. – bananafish Aug 30 '16 at 02:50
  • I'm looking to join rather than append in this case. Appending does work, but doesn't give me what I'm looking for. Do you know why it isn't working in this case? – wowdavers Aug 30 '16 at 02:59
  • Well, I don't know what you are looking for. Can you give an example input/output? – bananafish Aug 30 '16 at 03:01
  • Should be a data frame with Date as the index, then 50 columns (each one should be a state abbreviation) with data corresponding to a specific date. – wowdavers Aug 30 '16 at 03:02

1 Answers1

1

You can pass a list of codes to the quandl.get function, then you get a dataframe back with data for each code in a column. Code:

import quandl
import pandas as pd

fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states', flavor='html5lib')
data = quandl.get(["FMAC/HPI_"+s for s in fiddy_states[0][0][1:]])
bananafish
  • 2,877
  • 20
  • 29
  • Hm, looks like it works - thanks a lot! I guess I'm still just a little confused as to why `.join()` won't work in the original instance of my code. – wowdavers Aug 30 '16 at 03:21
  • 1
    It doesn't work because the returned dataframe from quandl has a single column with the name 'Value' for each get query, so when you try to join you need to tell it what to name the columns. If you also set the column name (to say `abbv`) in the original for loop, I think it will work. – bananafish Aug 30 '16 at 03:27
  • I'm a bit of a newbie at using pandas, so bear with me. How do you tell that the dataframe from quandl only has a single column 'Value'? Also, where would I set the column name, and is the name arbitrary? Thanks a lot! :) – wowdavers Aug 30 '16 at 03:31
  • 1
    @D.Wang no worries. I just printed the response object from quandl in each iteration (changed the for loop to `fiddy_states[0][0][1:4]]` only do 3 calls or it would be slow). I can see that there is a single column 'Value' and an index named 'Date' at the top. See [this question](http://stackoverflow.com/questions/11346283/renaming-columns-in-pandas) for how to rename columns. – bananafish Aug 30 '16 at 03:35
  • The column names display in the format "FMAC/HPI_AL - Value" for me (all the same except the state codes). Are you saying that the column names should all just be sans the " - Value" (so in the format of "FMAC/HPI_AL")? Where would I rename the columns (because I 'append' through every iteration)? I would have to rename before I join, correct? – wowdavers Aug 30 '16 at 04:06
  • 1
    I am talking about your for loop that does a single get query per iteration. Here the column name is 'Value'. I think you are talking about the code in my answer, which does all queries in one line, and gives the descriptive column names you describe. If you want to stick with the for loop (which I don't recommend, doing 50 joins will be slow), rename the column 'Value' to the value of `abbv` at each step. Yes, you would have to rename before the join. – bananafish Aug 30 '16 at 04:17
  • Got it, I'll keep it with the code you posted, but I'm glad to see that the for loop works too. Thanks a lot for the help! Is it a given that every time I try using `join()`, I'll have to name the columns? – wowdavers Aug 30 '16 at 04:27
  • 1
    @D.Wang Yeah, you just have to make sure the column names are unique. Just think about what you are trying to do. Pandas usually makes it quite easy and succinct once you know that. – bananafish Aug 30 '16 at 04:30