0

I am trying to analyze a csv file using dataframes in the Pandas library in Python. The file contains data for a chain restaurant and different orders that have been placed there. It's over 5000 lines so I won't upload the whole thing here, but it mostly looks like this:

order_number item_name total_ordered
1 fries 2
1 burger 2
2 fries 1
3 nuggets 1

Basically, the first customer ordered 2 fries and 2 burgers, the second ordered 1 fries, and the third ordered 1 nuggets.

I want to analyze only one order in the dataframe, which all are categorized under the same order number, but there are 4 items intended for 4 people. If you look in the .csv file, it looks like this:

order_number item_name total_ordered
1238 burger 1
1238 burger 1
1238 fries 1
1238 nuggets 1

So far, I have isolated the data for order # 1238 in a separate dataframe, using this code which works fine on its own and produces the above table:

import pandas as pd

restaurant = pd.read_csv("files/restaurant_data.csv")

ordered_items = restaurant.iloc[1504:1508] #these are the positions of the cells in the table

column_names = ["order_number","item_name", "total_ordered"]

ordered_items.iloc[0:]

The issue is, I am trying to add another column to the dataframe that includes different customers' names. The names don't come from within the .csv file. I want to manually add them in the code. Ideally it looks like:

order_number item_name total_ordered customer_name
1238 burger 1 Jenny
1238 burger 1 Leon
1238 fries 1 Tuan
1238 nuggets 1 Victoria

However, I am getting errors when I try to do this, such as the customer_name column being completely filled with NaN values, or that "A value is trying to be set on a copy of a slice from a DataFrame."

I have tried to make a list of the customers involved for the order 1238 subset of the data, turn it into a Panda series, and then add it as the data for the column. Such as:

import pandas as pd

restaurant = pd.read_csv("files/restaurant_data.csv")

customers = ["Jenny", "Leon", "Tuan", "Victoria"]
customer_name = pd.Series(customers)

ordered_items = restaurant.iloc[1504:1508]

column_names = ["order_number","item_name", "total_ordered", "customer_name"]

order_1238 = pd.DataFrame(ordered_items, columns=column_names)

order_1238.iloc[0:]

While this adds the customer_name column that I want, it doesn't actually fill the data from the list in:

order_number item_name total_ordered customer_name
1238 burger 1 NaN
1238 burger 1 NaN
1238 fries 1 NaN
1238 nuggets 1 NaN

I've tried some other approaches, like using a dictionary that assigns each customer with the position in the dataset (instead of a list and Panda series):

import pandas as pd

restaurant = pd.read_csv("files/restaurant_data.csv")

ordered_items = restaurant.iloc[1504:1508]

ordered_items.loc["customer_name"] = {"Jenny": restaurant.iloc[1504], 
                "Leon": restaurant.iloc[1505],
                "Tuan": restaurant.iloc[1506],
                "Victoria": restaurant.iloc[1507]
                 }

column_names = ["order_number","item_name", "total_ordered", "customer_name"]

order_1238 = pd.DataFrame(ordered_items, columns=column_names)

order_1238.iloc[0:]

However, this code gives me the warning "A value is trying to be set on a copy of a slice from a DataFrame." and then adds an entirely new row with NaN values, which is even further from the expected output. The output looks like:

            | order_number | item_name | total_ordered | customer_name |
            | ------------ | --------- | ------------- | ------------- |
            | 1238.0       | burger    | 1.0           | NaN           |
            | 1238.0       | burger    | 1.0           | NaN           |
            | 1238.0       | fries     | 1.0           | NaN           |
            | 1238.0       | nuggets   | 1.0           | NaN           |

friends_list | NaN | NaN | NaN | NaN |

Overall, not sure where I'm going wrong but would appreciate any help!

FJJ
  • 37
  • 4

1 Answers1

1

You can add the new column like this:

import pandas as pd

restaurant = pd.read_csv("files/restaurant_data.csv")

customers = ["Jenny", "Leon", "Tuan", "Victoria"]
ordered_items = restaurant.iloc[1504:1508]

column_names = ["order_number","item_name", "total_ordered", "customer_name"]

order_1238 = pd.DataFrame(ordered_items, columns=column_names)

order_1238['customer_name'] = customers

The problem with the first approach was that you werent actually adding the column values in your dataframe. Just because the column name and the variable name are same doesn't mean pandas would automatically add those values. Which is why you were seeing the NULL values under customer_name column. By assigning that column to the list customers, you can fill that column with the corresponding values.

P. Shroff
  • 396
  • 3
  • 5