-1

I have an excel file containing some columns and, in each column some values to be searched into a database.

example table

I want to read this file (I am using pandas because its a very simple way to read excel files) and extract info into variables:

Desired extract information of each row
Company : Ebay (STR format)
company_name_for_search : [EBAY, eBay, Ebay] (list of strings)
company_register: [4722,4721] (list os ints)

Getting this info, I will run a search script. Some info must be lists because the script will do e search for every item inside the list (for loop).

When I read the excel file, each column is read as a object type in a dataframe, so I couldn't access each value inside such object.

How to split values, change formats and deal with that?

FábioRB
  • 335
  • 1
  • 12

1 Answers1

1

Your variables are represented as single strings rather than rows of strings and numbers.

Instead of:

company_name register
eBay 4722
eBay 4721
Amazon 9999

You have:

company_name register
ebay,ebay 4722,4721
amazon 9999

You can split each string and then explode the resulting Series containing arrays to get a long form DataFrame.

import pandas as pd

mess = pd.DataFrame(
    {
        "letters": ["A,B", "C,D", "E,F,G,H"],
        "nums": ["100,200", "300,400", "500, 600, 700, 800"],
    }
)

mess = mess.apply(lambda col: col.str.split(",").explode())
Joshua Megnauth
  • 281
  • 1
  • 7
  • I did not get something. I am trying to apply you code but I got two issues : 1) "FutureWarning: reindexing with a non-unique Index is deprecated and will raise in a future version. mess = dfbase.apply(lambda col: col.str.split(",").explode())" and 2) ValueError: cannot reindex on an axis with duplicate labels – FábioRB Aug 04 '22 at 00:55
  • What version of pandas are you using @FábioRB? I haven't received that warning on version 1.4.3. You can check by printing `pd.__version__`. – Joshua Megnauth Aug 06 '22 at 00:12
  • 1.4.3. I think that I got the issue. You said an example where the number os itens in each group are compatible. I have some as I showed in the example (1 at first column, 1 or more at the second an third) – FábioRB Aug 06 '22 at 16:24