4

I want to add the codes to the dataframe dictionary.

 codes = [['01', '02', '03', '05', '06', '08', '10', '11', '13', '15', '17', '19', '21', '23', '25', '27', '29', '31', '33', '35', '37', '39', '43', '45', '4.55', '48', '52']
 #27Codes

 df = pd.read_excel(sales,sheet_name=None,ignore_index = True, skiprows=7)
 #27 Sheets
 for i in codes:
      for key in df.keys():
          df['Sheet1']['Code'] = i

I can't figure out why I seem to have the i in every dataframe. I think I understand why I can't figure out how to correct it. I am a beginner at coding.

Expected output:

df['Sheet1']

   Date         Particulars    Inwards  Code

1 2017-04-01         EFG           12800    01
2 2017-07-22         ABC           100      01
3 2017-09-05         BCD           10000    01
4 2018-03-13         ABC           2000     01

Code column should be 02 in the next dataframe and so on.

After this I want to concat the dataframes and group_by particulars and then write to Excel.

halfer
  • 19,824
  • 17
  • 99
  • 186
Sid
  • 3,749
  • 7
  • 29
  • 62
  • 1
    You're iterating over all codes and all keys where *I think* you just want all pairs. So something like `for i, key in zip(codes, df.keys()): df[key]['Code'] = i` may be what you want. – pault Apr 06 '18 at 15:27
  • you will need multiple index – BENY Apr 06 '18 at 15:27
  • @pault thank you so much. I have been at this for almost 5 hours. Please add it as an answer, also if a little more explanation would be possible I would highly enjoy learning more. – Sid Apr 06 '18 at 15:33

2 Answers2

5

You can use a dictionary comprehension for this:

df = {k: v.assign(Code=x) for x, (k, v) in zip(codes, df.items())}

pd.DataFrame.assign allows you to add a series with a fixed value.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks @jpp this works but which would be considered a better option between this and the one in the comment? – Sid Apr 06 '18 at 15:39
  • pault's answer is likely more efficient. But if performance is not an issue, choose what makes more sense to you. – jpp Apr 06 '18 at 15:40
4

When reading a workbook with multiple sheets, pandas.read_excel returns a dictionary of DataFrames where the keys of the dictionary are the names of the sheets.

It seems like you want to add a column code to each DataFrame based on the values in a list.

Your code:

for i in codes:
    for key in df.keys():
        df['Sheet1']['Code'] = i

Has two issues. Firstly, inside the loop you are not using the key at all. You're always accessing "Sheet1". Secondly, this is a double for loop which means it will iterate over every sheet for every code.

What you want instead is to loop over the values in parallel. Basically you want to do the following:

for i in range(len(codes)):
    code = codes[i]
    key = df.keys()[i]
    df[key]['Code'] = code

This is exactly what zip() does. So, you can write the above loop more compactly as:

for code, key in zip(codes, df.keys()):
    df[key]['Code'] = code

Not this assumes that the length of codes is equal to the number of keys in the dictionary df.

Afterwards, you can concatenate all of the DataFrames using pandas.concat:

combined = pd.concat(df)

Which works because concat:

takes a sequence or mapping of Series, DataFrame

...

If a dict is passed, the sorted keys will be used as the keys argument, unless > it is passed, in which case the values will be selected (see below).

Community
  • 1
  • 1
pault
  • 41,343
  • 15
  • 107
  • 149
  • Thanks @pault the explanation really helps with remembering and learning. If possible could you add how to `groupby` ['Particulars'], it returns an object and I can't seem to figure out(even though there are a lot of answers to it) why I can't get a `Dataframe` from it. Basically want to get all the rows with the same `Particulars` into a new dataframe. I have a way of doing it but it seemed `groupby` and `sort` should have been used. – Sid Apr 06 '18 at 16:03
  • 1
    @Sid groupby requires an aggregate function. I think you just want to sort the dataframe, but I'm not exactly clear on your desired output. I think in this case you should ask a new question as that is a separate issue. – pault Apr 06 '18 at 16:05