0

I'm actually trying to transform records of column of payment receipt into pandas dataframe. I read the records row by row and determine which data should be in which column. So I created empty dataframe like this:

df=pd.DataFrame
df['QTY']=None
df['Unit Price']=None

and try to fill that empty df like this: In this case, the record is "1X527,000 "

i=0
buff=[]
for line in df1.iterrows():
    if 'X' in line:
        try:
            buff=[float(a.replace(',','').strip()) for a in line]
            df.at[i,'QTY']=buff[0]
            df.at[i,'Unit Price']= buff[1]
        except ValueError:
            break;
    i+=1

I knew it would't work because there's yet no index in the data frame, but I don't know how to make dynamic data frame. Can you help me?

EDIT: This is just sample of the records. I opened using "open" in python because it has no regularity. I put it into other single-column data frame, df1.

7            1X 527,000                             
8           AUS MOL CRLD CD750            527,000  *
9             SUBTOTAL:                   527,000   

EDIT2: This is the working code:

for line in df1.itertuples():
    if 'X' in line[1]:
        try:
            buff=[float(a) for a in line[1].strip().replace(',','').split('X',1)]
            df_dict['QTY'].append(buff[0])
            df_dict['Unit Price'].append(buff[1])

Thanks all!

  • what part of this string `"1X527,000 "` goes into QTY and Unit Price – Joe Ferndz Jan 15 '21 at 21:28
  • Are you looking for something like this `df = pd.DataFrame({'QTY':[200,300,400],'Unit_Price':[10.50,15.25,12.75]})` Where is the value `1X527,000` coming from? Is it in some variable that you need to split and load into the dataframe? – Joe Ferndz Jan 15 '21 at 21:30
  • '1' goes to QTY, '527000' goes to Unit Price. Yes, I need to split it with 'X' because the raw data is "1X527,000 ", and I need to trim it because there are unnecessary trailing spaces and yes, I wanna load it into dataframe. – WeisSchwarz Jan 15 '21 at 21:36
  • Where is the data. Is it in a flat file, a list, or somewhere else? Need more information to help you get to the answer – Joe Ferndz Jan 15 '21 at 21:42
  • Sounds like you can do this with a [delimiter split into multiple columns](https://stackoverflow.com/a/37333352/530160), with a delimiter of 'X'. Then you'd need to address the situation where there's no quantity by detecting a missing value in the unit price column. – Nick ODell Jan 15 '21 at 21:55
  • sorry I've mistaken the first "for loop", it was df, should be df1. I buffer the records in df1. I want to transform it into new clean data frame df which is empty, only filled with column names i.e: "QTY" & "Unit Price". – WeisSchwarz Jan 15 '21 at 22:09

1 Answers1

1

You need to put parenthesis after pd.DataFrame and attend an empty list to columns as below.

df=pd.DataFrame()
df['QTY']=[]
df['Unit Price']=[]

Inside the loop you can append your list with loc method

df.loc[len(df)] = buff

Every time it will append the buff list as a new raw.

An alternative way: You can start a dictionary before the for loop

df_dic = {}
df_dic['QTY'] = []
df_dic['Unit Price'] = []

After that, inside the for loop, you can append your values to lists;

for line in df.iterrows():
    ....
    ....
    df_dic['QTY'].append(buffer[0])
    df_dic['Unit Price'].append(buffer[1])

end of the for loop you can turn dic to data frame

df = pd.DataFrame(df_dic)
benan.akca
  • 101
  • 3