I am trying to analyze a csv file using dataframes in the Pandas library in Python. The file contains data for a chain restaurant and different orders that have been placed there. It's over 5000 lines so I won't upload the whole thing here, but it mostly looks like this:
order_number | item_name | total_ordered |
---|---|---|
1 | fries | 2 |
1 | burger | 2 |
2 | fries | 1 |
3 | nuggets | 1 |
Basically, the first customer ordered 2 fries and 2 burgers, the second ordered 1 fries, and the third ordered 1 nuggets.
I want to analyze only one order in the dataframe, which all are categorized under the same order number, but there are 4 items intended for 4 people. If you look in the .csv file, it looks like this:
order_number | item_name | total_ordered |
---|---|---|
1238 | burger | 1 |
1238 | burger | 1 |
1238 | fries | 1 |
1238 | nuggets | 1 |
So far, I have isolated the data for order # 1238 in a separate dataframe, using this code which works fine on its own and produces the above table:
import pandas as pd
restaurant = pd.read_csv("files/restaurant_data.csv")
ordered_items = restaurant.iloc[1504:1508] #these are the positions of the cells in the table
column_names = ["order_number","item_name", "total_ordered"]
ordered_items.iloc[0:]
The issue is, I am trying to add another column to the dataframe that includes different customers' names. The names don't come from within the .csv file. I want to manually add them in the code. Ideally it looks like:
order_number | item_name | total_ordered | customer_name |
---|---|---|---|
1238 | burger | 1 | Jenny |
1238 | burger | 1 | Leon |
1238 | fries | 1 | Tuan |
1238 | nuggets | 1 | Victoria |
However, I am getting errors when I try to do this, such as the customer_name column being completely filled with NaN values, or that "A value is trying to be set on a copy of a slice from a DataFrame."
I have tried to make a list of the customers involved for the order 1238 subset of the data, turn it into a Panda series, and then add it as the data for the column. Such as:
import pandas as pd
restaurant = pd.read_csv("files/restaurant_data.csv")
customers = ["Jenny", "Leon", "Tuan", "Victoria"]
customer_name = pd.Series(customers)
ordered_items = restaurant.iloc[1504:1508]
column_names = ["order_number","item_name", "total_ordered", "customer_name"]
order_1238 = pd.DataFrame(ordered_items, columns=column_names)
order_1238.iloc[0:]
While this adds the customer_name column that I want, it doesn't actually fill the data from the list in:
order_number | item_name | total_ordered | customer_name |
---|---|---|---|
1238 | burger | 1 | NaN |
1238 | burger | 1 | NaN |
1238 | fries | 1 | NaN |
1238 | nuggets | 1 | NaN |
I've tried some other approaches, like using a dictionary that assigns each customer with the position in the dataset (instead of a list and Panda series):
import pandas as pd
restaurant = pd.read_csv("files/restaurant_data.csv")
ordered_items = restaurant.iloc[1504:1508]
ordered_items.loc["customer_name"] = {"Jenny": restaurant.iloc[1504],
"Leon": restaurant.iloc[1505],
"Tuan": restaurant.iloc[1506],
"Victoria": restaurant.iloc[1507]
}
column_names = ["order_number","item_name", "total_ordered", "customer_name"]
order_1238 = pd.DataFrame(ordered_items, columns=column_names)
order_1238.iloc[0:]
However, this code gives me the warning "A value is trying to be set on a copy of a slice from a DataFrame." and then adds an entirely new row with NaN values, which is even further from the expected output. The output looks like:
| order_number | item_name | total_ordered | customer_name |
| ------------ | --------- | ------------- | ------------- |
| 1238.0 | burger | 1.0 | NaN |
| 1238.0 | burger | 1.0 | NaN |
| 1238.0 | fries | 1.0 | NaN |
| 1238.0 | nuggets | 1.0 | NaN |
friends_list | NaN | NaN | NaN | NaN |
Overall, not sure where I'm going wrong but would appreciate any help!