0

I need to read a list of HTML files into pandas DataFrames.

  1. Each HTML file has multiple dataframes ( I have used pd.concat to combine them ) .
  2. The HTML file names contains a string which I would like to add as a column.
# Read all files into a list
files = glob.glob('monthly_*.html')

# Zip the dfs with the desired string segment
zipped_dfs = [zip(pd.concat(pd.read_html(file)), file.split('_')[1]) for file in files]

I am having trouble unpacking the zipped list of ( df, product ).

dfs = []

# Loop through the list of zips, 
for _zip in zipped_dfs:

    # Unpack the zip
    for _df, product in _zip:

        # Adding the product string as a new column
        _df['Product'] = product
        dfs.append(_df)

However, I am getting the error 'str' object does not support item assignment

Could someone explain the best way to add the new column ?

yongsheng
  • 376
  • 3
  • 19
  • Have you made sure that `zipped_dfs` has the right values? – megamind Apr 04 '20 at 00:47
  • Please provide a [mcve], as well as the entire error message. – AMC Apr 04 '20 at 01:28
  • Does this answer your question? ['str' object does not support item assignment in Python](https://stackoverflow.com/questions/10631473/str-object-does-not-support-item-assignment-in-python) – AMC Apr 04 '20 at 01:28

1 Answers1

1

You should remove the zip line from the list comprehension. If you want a tuple of the concatenated dataframes and the product name, then you should write:

zipped_dfs = [(pd.concat(pd.read_html(file)), file.split('_')[1]) 
              for file in files]

However, the intermediate step of creating a list of tuples is not needed. The entire approach can be simplified as follows:

dfs = []
for file in glob.glob('monthly_*.html'):
    # NOTE: your code seemingly keeps .html in the product name
    # so I modified the split operation
    df = pd.concat(pd.read_html(file))
    df['Product'] = file.split('.html')[0].split('_')[1]     
    dfs.append(df)
Eric Truett
  • 2,970
  • 1
  • 16
  • 21